Arthur151 / ROMP

Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
https://www.yusun.work/
Apache License 2.0
1.36k stars 231 forks source link

issue with multi-person pose tracking #302

Open dhhjx880713 opened 2 years ago

dhhjx880713 commented 2 years ago

Hi Yu,

we used your work to extract 3D motion from 2D videos. In general, it works quite well for single-person scenarios. Nice work! However, when I tried with multiperson cases, I have two issues:

  1. regarding exporting motion to fbx/bvh. It seems to me that convert2fbx.py only exports the motion for one character. I looked into .npz file, and the 3D joint position data for all person are there. So this is not really critical since I still can extract 3D motion data.
  2. another issue is a little bit more annoying to me. I noticed that the tracking result seems to be inconsistent for multi-person. You can see an example below. There are two consecutive frames. The estimated poses look fine for me. However, the colors are swapped. I assume the color is the character's identity. And the extract 3D motion is corresponding to the colored skeleton, which is flipped. This might not be a big problem for 3D pose estimation in the video since every person in the scene still has an estimated pose attached. But for the 3D motion, the motion is a mix of two-person and not consistent in time, so you will see a sudden jump at some point in time.

00000012 00000013 It would be great if you can help us with the mentioned issues. Thank you very much in advance.

Arthur151 commented 2 years ago

Hi, @dhhjx880713 , Thanks for your kind words and comments. About the question:

  1. Yes, the fbx export currently only support the motion of one person. We need to select the subject ID during exporting.
  2. Sorry about that. The tracking function is not used by default. If you want to use the tracking function, please set -t during inference. The mesh color represent the depth relations if the tracking is not used.