bardiadoosti / HOPE

Source code of CVPR 2020 paper, "HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation"
270 stars 58 forks source link

Abou labels #15

Closed www516717402 closed 4 years ago

www516717402 commented 4 years ago

Hello, There somes questions in your project. 2D_init_label: Input image size 224*224, However label range(0, 1920-1280), Do not sync label with image resize. I never see this operation. 3D_lables: Use 3d label in camera space directly. This operate means predict absolute depth that 2D image predict absolute include ambiguous. Almost 3D paper predict root-relate depth or use both net to preidct root-relate and absolute respective. Hope your reply. Thank you.

www516717402 commented 4 years ago

Extract hand keypoints detect module from HOPE. Train in RHD-hand dataset, 30mm@PCK73.13%, Agv:23.77mm. Use 3D gaussian map NetWork achieve 30mm@PCK93.91%, Agv:10.05mm.

bardiadoosti commented 4 years ago

Hi,

Thanks for your comment. If we had used detection based method for our keypoint detector we had to do scale the outputs. In the detection based method the network predicts a smaller heatmap (let's say 224*224) for each of the keypoints and to convert these to the image dimension various methods like softargmax can be used. But because of the reasons that we explained in the paper we had to use regression based method which itself learns how to treat the situation of course with some error. The 2D to 3D is the reverse operation which we do for converting 3D to 2D. Having camera's intrinsic and extrinsic parameters we easily can convert 3D to 2D. Here the Adaptive Graph U-Net is exactly learning this transformation for a very specific camera and angle condition. That's why it will not work on a different camera with different angle.

www516717402 commented 4 years ago

Oh, I see. Thank you. In my view, Graph CNN treat about Mesh vertices question batter than tradition CNN.

bardiadoosti commented 4 years ago

Yeah you are right one of the very common use cases of the Graph CNNs are for mesh operations like mesh classification.

hedjm commented 4 years ago

@bardiadoosti Thank you for this great work, you said in the above comment "Here the Adaptive Graph U-Net is exactly learning this transformation for a very specific camera and angle condition." but in the paper, in the last paragraph of the introduction, you mentioned that you pretrained the 2D to 3D GraphUNET on synthetic data (Obman) which have a totally different intrinsic/extrinsic parameters, would you please clarify this?

Thank you again for your work.