Vegetebird / MHFormer

[CVPR 2022] MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MIT License
530 stars 85 forks source link

In vis.py, the output and selection problem of three-dimensional space coordinates (X,Y,Z) #98

Closed Buerrrrr closed 1 year ago

Buerrrrr commented 1 year ago

Hello, author. This is an impressive work, no doubt. I am doing research on human motion recognition and hope to use MHformer to get 3D coordinates of subjects. However, I have met some questions and hope to get your answers.

  1. Should we choose 3D coordinates in camera coordinate system or in world coordinate system? Or can both be acceptable?
  2. Is the 3D coordinate in the camera coordinate system output from this line of code? If not, what line of code is it coming from? 相机坐标系 Are 3D coordinates in world coordinates output from this line of code? If not, what line of code is it coming from? 世界坐标系
  3. I can understand the code meaning of this line of code, but how does it affect the final 3D output? 作用 Finally, take some time out of your busy schedule to read these questions!
Vegetebird commented 1 year ago

I think using 3D coordinates in the camera coordinate system is more appropriate. The world coordinates used are just to make the visualization more friendly. 2.1 Yes. This line of code outputs the 3D coordinates in the camera coordinate system. 2.2 Yes. This line of code outputs the 3D coordinates in world coordinates.

  1. We select the results from the center frame for evaluation, which is slightly more accurate. You can omit this line of code to achieve faster seq2seq inference.
Archives-RZ commented 3 months ago

I would like to ask what are the origin of the world coordinate system and the directions of the three coordinate axes?