Closed mariosconsta closed 1 year ago
about 7 FPS in 3090
about 7 FPS in 3090
I am wondering if that can be improved if we set it to run inference on like every 10-20 frames instead of every frame. Given that a crowd of people will most likely still be there after 1 or 2 secs.
I tried several architectures and saw that timm-regnetx_064 encoder + PAN decoder from here https://github.com/qubvel/segmentation_models.pytorch takes 9.76 GFlops vs 26.46 GFlops currnet hrnet, but results a little bit worse, 57.14 vs 59.29 on NWPU (2048) dataset
I tried several architectures and saw that timm-regnetx_064 encoder + PAN decoder from here https://github.com/qubvel/segmentation_models.pytorch takes 9.76 GFlops vs 26.46 GFlops currnet hrnet, but results a little bit worse, 57.14 vs 59.29 on NWPU (2048) dataset
This is interesting, I will give it a good read. Thanks!
@mariosconsta I published my implementation, now you can check it out https://github.com/rydenisbak/FIDTM/tree/visionlabs_implement
@mariosconsta I published my implementation, now you can check it out https://github.com/rydenisbak/FIDTM/tree/visionlabs_implement
Thanks man, I appreciate it!
Did anyone try this using a camera for real time tracking? What's the FPS like? Is this implementation viable for a real time scenario or is the FPS really low?