geopavlakos / hamer

HaMeR: Reconstructing Hands in 3D with Transformers
https://geopavlakos.github.io/hamer/
MIT License
326 stars 28 forks source link

Direction and Origin of the 3D coordinate frame at Mesh.obj saving time; given camera (K, R, t), how to translate to a world frame? #18

Closed Zi-ang-Cao closed 6 months ago

Zi-ang-Cao commented 7 months ago

Hey Authors, Thanks for the great work! I would like to understand the origin of the 3D coordinate frame at Mesh.obj saving time. Meanwhile, I am aware of the two different focal lengths (model_cfg.EXTRA.FOCAL_LENGTH vs scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()) are using for scaled (256, 256) image and input image (say, (2208, 1242)). They later affect "pred_cam_t" and "pred_cam" respectively.

Given the ability of estimating the 3D pose of hand and align with the input image, if we feed the focal_length, extrinsic and intrinsic matrix associated with a real camera, can we translate the predicted 3D pose to a certain world frame?

Those are some relevant questions:

The above questions are all depends to the definition of 3D frame axis direction and origin pose. Hope you can help me on that!

Thanks,

Screenshot 2024-01-22 at 11 01 20 PM
geopavlakos commented 6 months ago

If you want to get a mesh in the camera frame, where the convention is X - right, Y- down, Z - forward, you might want to skip the transformation here. In that case, you should have the mesh in the camera frame. I would use scaled_focal_length=1055 as you suggest. Then, if you want to convert this to the world coordinates, you can just apply the extrinsics transformation and go from camera to world.

Zi-ang-Cao commented 6 months ago

Thanks!