Walter0807 / MotionBERT

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"
Apache License 2.0
1.02k stars 123 forks source link

Human Coordinates Relative to the Camera #52

Closed WingkitChou closed 1 year ago

WingkitChou commented 1 year ago

This work is truly impressive. I find myself wondering, what would happen if both the camera and the human subject were in motion? How could we determine the relationship between the camera and the human in this case? Let's assume we can determine the position of the camera in relation to a 'world coordinate system.' Then how can we determine the position of the human within this 'world coordinate system'?

I'm interested in understanding how to ascertain the coordinates of a human subject relative to the camera in a given scene. Are there specific techniques or methodologies available to determine this? Any guidance or resources on this topic would be highly appreciated.

Walter0807 commented 1 year ago

Thanks for your interest. This work has not considered moving cameras yet. For that case, you could check:

BiomechatronicsRookie commented 1 year ago

Hi I have a question closely related to this topic, I think posting it here might be better than opening another issue.

First of all indeed amazing work. I understood that the positions of the output lifted model are expressed relative to the root joint, however, I was wondering in which frame is the lifted pose expressed on in terms of the orientation of the model? Based on a quick exploration of the output it seems to me that it might be expressed in "camera frame" already?

Walter0807 commented 1 year ago

Hi I have a question closely related to this topic, I think posting it here might be better than opening another issue.

First of all indeed amazing work. I understood that the positions of the output lifted model are expressed relative to the root joint, however, I was wondering in which frame is the lifted pose expressed on in terms of the orientation of the model? Based on a quick exploration of the output it seems to me that it might be expressed in "camera frame" already?

The results are in pixel coordinates (if this is your question).

WingkitChou commented 1 year ago

4914bcc6a5f0f1cc8149a9e1df14575e When visualizing the human position in XYZ coordinates, I noticed that the depth value is consistently close to zero. This also indicates that the results are in pixel coordinates (x,y).