dk-liang / FIDTM

[IEEE TMM] Focal Inverse Distance Transform Maps for Crowd Localization
MIT License
169 stars 41 forks source link

Real Time tracking / counting? #43

Closed mariosconsta closed 1 year ago

mariosconsta commented 1 year ago

Did anyone try this using a camera for real time tracking? What's the FPS like? Is this implementation viable for a real time scenario or is the FPS really low?

dk-liang commented 1 year ago

about 7 FPS in 3090

mariosconsta commented 1 year ago

about 7 FPS in 3090

I am wondering if that can be improved if we set it to run inference on like every 10-20 frames instead of every frame. Given that a crowd of people will most likely still be there after 1 or 2 secs.

rydenisbak commented 1 year ago

I tried several architectures and saw that timm-regnetx_064 encoder + PAN decoder from here https://github.com/qubvel/segmentation_models.pytorch takes 9.76 GFlops vs 26.46 GFlops currnet hrnet, but results a little bit worse, 57.14 vs 59.29 on NWPU (2048) dataset

mariosconsta commented 1 year ago

I tried several architectures and saw that timm-regnetx_064 encoder + PAN decoder from here https://github.com/qubvel/segmentation_models.pytorch takes 9.76 GFlops vs 26.46 GFlops currnet hrnet, but results a little bit worse, 57.14 vs 59.29 on NWPU (2048) dataset

This is interesting, I will give it a good read. Thanks!

rydenisbak commented 1 year ago

@mariosconsta I published my implementation, now you can check it out https://github.com/rydenisbak/FIDTM/tree/visionlabs_implement

mariosconsta commented 1 year ago

@mariosconsta I published my implementation, now you can check it out https://github.com/rydenisbak/FIDTM/tree/visionlabs_implement

Thanks man, I appreciate it!