Arthur151 / ROMP

Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
https://www.yusun.work/
Apache License 2.0
1.33k stars 229 forks source link

Strange Result on custom video #44

Open abhaygargab opened 3 years ago

abhaygargab commented 3 years ago

Hello @Arthur151, Congratulations on the great work and Thanks for making the code available for use.

I tried running the inference script on a custom video via: CUDA_VISIBLE_DEVICES=0 python core/test.py --gpu=0 --configs_yml=configs/video.yml

But the result is strange.. Is this because of the camera view ?? If yes then, Is there any way to tune the model for such camera views?? Screenshot from 2021-05-12 17-35-47

Arthur151 commented 3 years ago

In this case, it fails in detection. I am solving this problem by fine-tuning ROMP on detection datasets with small subjects. For now, you can crop the image into multiple small pieces and mix up the results.

abhaygargab commented 3 years ago

Thank You so much for the response. So, do you that i should maybe crop the image into 4 sub-images and run the model on the 4 images separately?? In that case how should i combine the results??

Also, is there any way that i can feed the object detections or 2D pose estimations to your model by calculating them from external sources ??

Arthur151 commented 3 years ago

Yes, it is all about the scale of people in images. You may get the 3D mesh results of people with enough scale. It is complicate to get the rendering results. Currently, ROMP doesn't support the detection from the external sources.

Arthur151 commented 3 years ago

@ayedaemon Sorry for accidently closing this issue. I am working on this problem too. I can't promise the exact time of releasing the related code. But the problem does allievated to some extend.