facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
689 stars 91 forks source link

How do you fuse MANO parameters from different views? #61

Open Deng-Y opened 3 years ago

Deng-Y commented 3 years ago

Hello! I read the NeuralAnnot paper and have some questions. Can you help me?

  1. NeuralAnnot takes a single-view image as input and outputs a set of MANO parameters. Thus for a single hand pose in InterHand2.6M, you will get multiple sets of MANO parameters from multiple views. How do you fuse MANO parameters from different views?

  2. How do you estimate that the fitting error is about 5 mm?

  3. NeuralAnnot is only supervised with the 3D pose (i.e., 3D keypoints) without shape information. Can it really learn to predict shape parameters in MANO or SPML?

Thank you! Look forward to your reply!

mks0601 commented 3 years ago
  1. As written in Section 6.4, we ran NeuralAnnot on only one view and transformed it to the world coordinate system.
  2. Please read Section 6.2 Direct 3D annotation error.
  3. It can learn some bone length-related information (e.g., how tall/short the body is and how big/small the hand is). Also, distance between hip joints can represent how fat/thin the body is (as shown in Pose2Mesh, which is my previous work. see Section 9.1 of https://arxiv.org/pdf/2008.09047.pdf. Pose2Mesh recovers 3D mesh from 2D pose.). Please note that many (almost all) 3D human body/hand shape estimation methods mainly rely on keypoint supervisions and they (including me) are interested in recovering 3D pose.
Deng-Y commented 3 years ago

Thank you for your quick response! Yes, I have read Pose2Mesh, and I understand why you cite it here.

From your words, I guess currently 3D mesh is more like a side product of 3D pose, and the objectives of most papers are to maximize the accuracy of 3D pose instead of 3D mesh, am I right? After all, the ground truth of 3D mesh is not easy to obtain.

mks0601 commented 3 years ago

Correct. Maybe I can incorporate more supervisions, such as depth maps or silhouettes, for more accurate 3D shapes.

Deng-Y commented 3 years ago

Do you have depth maps of the InterHand2.6M dataset? The silhouette is a more readily available weak supervision.

mks0601 commented 3 years ago

No I don't :(