Direction and Origin of the 3D coordinate frame at Mesh.obj saving time; given camera (K, R, t), how to translate to a world frame?

Zi-ang-Cao commented 7 months ago

Hey Authors, Thanks for the great work! I would like to understand the origin of the 3D coordinate frame at Mesh.obj saving time. Meanwhile, I am aware of the two different focal lengths (model_cfg.EXTRA.FOCAL_LENGTH vs scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()) are using for scaled (256, 256) image and input image (say, (2208, 1242)). They later affect "pred_cam_t" and "pred_cam" respectively.

Given the ability of estimating the 3D pose of hand and align with the input image, if we feed the focal_length, extrinsic and intrinsic matrix associated with a real camera, can we translate the predicted 3D pose to a certain world frame?

Those are some relevant questions:

If the real camera has focal length = 1055 pixel, W= 2208, and 0.002mm/pixel for the image film. Should we set scaled_focal_length=1055 and model_cfg.EXTRA.FOCAL_LENGTH = 1055 / (2208/256)=122?
What's the current origin of the 3D coordinate frame while saving Mesh.obj?
If every 3D mesh are in camera frame (pin hole point=origin of the frame) that associated with input image, why those hands are in the negative direction of the Z axis?
Why there are so many rotation along x-axis by \pi in rendering?
What's the meaning and unit of output["pred_cam"] from self.mano_head?

The above questions are all depends to the definition of 3D frame axis direction and origin pose. Hope you can help me on that!

Thanks,

geopavlakos commented 6 months ago

If you want to get a mesh in the camera frame, where the convention is X - right, Y- down, Z - forward, you might want to skip the transformation here. In that case, you should have the mesh in the camera frame. I would use scaled_focal_length=1055 as you suggest. Then, if you want to convert this to the world coordinates, you can just apply the extrinsics transformation and go from camera to world.

Zi-ang-Cao commented 6 months ago

Thanks!

geopavlakos / hamer

Direction and Origin of the 3D coordinate frame at Mesh.obj saving time; given camera (K, R, t), how to translate to a world frame? #18