the form of hand pose annotations

YorkWang-Go commented 2 years ago

Thanks for your great work!

I noticed that in label files, pose_m[:, 0:48] stored the MANO pose coefficients in PCA representation, and pose_m[0, 48:51] stored the translation. I wonder how can I transform them into axis-angle representation or quaternion representation? And are the poses represented in a root-relative system in the label files? Is it right that I can trasnform the pose to camera system by using the hand translation?

Thanks for your help!

ychao-nvidia commented 2 years ago

How can I transform pose_m into axis-angle representation or quaternion representation? We have an example for this in a different repo (handover-sim) we authored. The relevant part is in this block of code.
- This block extracts the global translation (as t) and wrist rotation (as q) and apply an SE(3) transform to the root joint according to rotation tag_R_inv (3x3 matrix) and translation tag_t_inv (3-dim vector). Note that an additional subtraction and addition of root_trans is required here for any SE(3) transformation since wrist rotation is not directly applied on the root of the kinematic tree in MANO.
- In the _transform() function you can see how we read the wrist rotation in axis-angle representation (here), apply the SE(3) transformation (here), and convert wrist rotation to the quaternion representation (here).
- Finally, this block extracts the finger articulated pose (in PCA representation) and converts it to axis-angle representation. You can further convert it to quaternion after this step.
Are the poses represented in a root-relative system in the label files? Hand pose are all represented in MANO format. pose_m[:, 48:51] and pose_m[:, 0:3] store the global translation and rotation respectively. pose_m[:, 3:48] stores the finger articulated pose in local frame.
Can I transform the pose to camera system by using the hand translation? You can transform the global pose (translation and rotation) following the example above. Again the only caveat is the subtraction and addition of root_trans before and after as shown in the example (here).

YorkWang-Go commented 2 years ago

Okay! Thanks for your early reply! These do help!

YorkWang-Go commented 2 years ago

I'm sorry that I have a further question.

What are this block used for? You said that an additional subtraction and addition of root_trans is required as wrist rotation is not directly applied on the root of the kinematic tree in MANO. Now I can visualize the hand pose with the raw parameters annotated in every label file in DexYCB dataset by using the pca mode in Manolayer, but the hand translation seemed to be wrong(the hand and object cannot be aligned). Inspired by your reply above, I think that may be the reason. However, I still can't get right results using pose_m and pose_y here with axis-angle mode in Manolayer(wrong hand pose and wrong translation) or just use the translation here and raw hand pose parameters in the dataset with pca mode in Manolayer(right hand pose and wrong translation). I think, now that I can get right hand poses using raw parameters, I only need to get the right hand_tsl parameters which can be directly applied to camera system to achieve my purpose. So I am confused that why this translation still doesn't work. How can I get the hand_tsl that I need? And is it right that hand_tsl has no relationship with mano?

Sorry to disturb you again and thanks for you kindness!

ychao-nvidia commented 2 years ago

What are this block used for?
- This block converts the MANO pose representation from DexYCB to a URDF model we use. This builds on the URDF model from this repo.
- This line resamples the mocap trajectories to a different frame rate.
- Since the URDF representation of rotation is in Euler angles, we need to "unwrap" the values to prevent discontinuity. This is what is done in this block.
How can I get the hand_tsl that I need? And is it right that hand_tsl has no relationship with mano? This is out of context of this repo. I don't recall a variable with that name. If you need to transform the mano pose from one camera to another camera, you need to apply the root_trans processing mentioned above on MANO translation.

A good exercise could be to transform MANO pose from one camera to another in DexYCB using the provided extrinsics. You can render the transformed pose and overlay it with the image recorded from the second camera as a sanity check.

YorkWang-Go commented 2 years ago

Okay! Thanks for your reply!

YorkWang-Go commented 2 years ago

I'm sorry that I'm still comfused about the representation form of the hand pose in label files. You said that hand poses were all represented in MANO format, does it mean root-relative? If so, pose_m[: , 0:48] stores the hand pose in stardard MANO formet, then is the global translation(pose_m[: , 48:51]) represented in camera coordinate system? And is pose_y also represented in camera coordinate system? Can I use pose_y and pose_m to transform the hand pose to the object's canonical coordinate system?

ychao-nvidia commented 1 year ago

Does it mean root-relative? Depends on what you mean by that. Again the MANO model is somewhat funny in that the global translation (pose_m[: , 48:51]) is applied to a ad-hoc root joint that does not belong to any of the hand joints. Meanwhile, the representation does not allow any rotation of this root joint. The first joint that allows rotation is the wrist joint (pose_m[:, 0:3]). Due to this formulation, if you were to apply any SE(3) transformation to a MANO representation - by viewing the wrist joint as the "root" - you need to perform the root_trans trick mentioned above. The notion of global translation + global rotation for typical rigid bodies does not exist in the MANO representation.
pose_m[: , 0:48] stores the hand pose in standard MANO format? Correct.
Is the global translation(pose_m[: , 48:51]) represented in camera coordinate system? Yes, it is in the camera coordinate system.
Is pose_y also represented in camera coordinate system? Yes.
Can I use pose_y and pose_m to transform the hand pose to the object's canonical coordinate system? Yes, you can. Again, you will need to do the root_trans trick rather than applying a standard global translation + global rotation operation.

YorkWang-Go commented 1 year ago

Okay! Thanks for your reply and patience!

NVlabs / dex-ycb-toolkit

the form of hand pose annotations #23