Walter0807 / MotionBERT

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"
Apache License 2.0
1.07k stars 134 forks source link

About the hips coordinates to world #28

Closed lucasjinreal closed 1 year ago

lucasjinreal commented 1 year ago

Hi, I got some real-time 3d pose result and visualize in open3d, it looks good:

ezgif-4-a234c576ce

However, I am wondering how to mapping the hips cooridinates to realworld, I am currently +0.65 for the z axis, but not aligned well, looks like it should be some value in normalized height hips to height. Do u know what exactly value it is?

Walter0807 commented 1 year ago

Looks very cool!

Do you mean the vertical height of the person? I think it depends on the 2D detection results. You can also check the --pixel flag which maps the 3D results to the pixel coordinates.

lucasjinreal commented 1 year ago

@Walter0807 yes, the real 3d pose. the model raw output should be relate to the pelvis as coordinates center, but in real world, the center should the ground

Walter0807 commented 1 year ago

The model raw output is related to the 2D positions in the images, it does not have a notion of ground.

For your case, you can estimate the ground position (e.g. make the lowest joint position on the ground plane).

lucasjinreal commented 1 year ago

@Walter0807 thanks, but this makes me confused, 2 aspect mainly:

  1. In real world application, you can not assume lowest joints always on the ground, this is a paradox, if I know it on the ground, then how should I need to estimation the 3d keypoints;
  2. If the 3d kpts hightly related to 2d keypoints on image, what if on image the man in the half of the image, and another one the man on the top half of the image, how will they recover to correct 3d cooridnates properly
Walter0807 commented 1 year ago

The main problem here is that you do not have other information from the images. To get the accurate global position, you need camera parameters, which are hard for in-the-wild monocular input.

lucasjinreal commented 1 year ago

In real application, especially in Game, we don't need REAL real world coordiantes, since the world is created by us, we can change it at will. The question is the related global trans is right or not. In my case, if person stand still on ground, the hips should the height of the lower-half body. how will MotionBert outputs in this case

Walter0807 commented 1 year ago

I do not quite understand your question. From your animation, the root trajectory is correct, all you need is a global offset.

lucasjinreal commented 1 year ago

Yes, am asking what's this value. Is it the h36m skeleton pelvis height from training data or not? But you tell me assume foot is on ground which makes me very confused. And you said the predicted coodinates is couple with image 2d keypoints, makes me wondering how to get this value correctly.

Walter0807 commented 1 year ago

Yes, am asking what's this value. Is it the h36m skeleton pelvis height from training data or not? But you tell me assume foot is on ground which makes me very confused. And you said the predicted coodinates is couple with image 2d keypoints, makes me wondering how to get this value correctly.

No, it depends on the 2D image view angles. A simple suggestion is just to make the lowest position (over the entire sequence) the ground plane, which would work for most cases.

lucasjinreal commented 1 year ago

@Walter0807 thanks, btw, I wanna try stack the mesh togather with the 3d keypoints head and train it simutenously, do u think it has large impact on the 3d keypoints accuracy or not?

Walter0807 commented 1 year ago

👍 I guess it would not harm the performance.

lucasjinreal commented 1 year ago

i will try it