Question about training the hand pose.

zhengsipeng commented 1 year ago

Hi, thanks a lot for your great work.

Recently I'm working on a self-supervised model where I plan to use frankmpcap to extract hand pose as the pseudo label instead of using GT. According to Eq(5) in your paper, the hand module loss is L=L{theta}+L{3D}+L{2D}+L{reg}. So if I want to use the same criterion as Eq(5) in my work, I assume I need to use the prediction of your hand module for supervision accordlingly like: 48-dim hand pose for L_{theta}'s label (pred_hand_pose); 10-dim for L_{reg}'s label (pred_hand_betas) 21x2 dim for L_{2D}'s label (pred_joints_img[:, :2])

But which output can I use for L_{3D}'s supervision? pred_joints_smpl or others? I notice that your hand module is 3D joint in smplx space -> 2D bbox -> 2D image, no 3D joints in image space are predicted.

penincillin commented 1 year ago

@zhengsipeng You can use pred_joints_smpl to calculate L_{3D}.

zhengsipeng commented 1 year ago

@zhengsipeng You can use pred_joints_smpl to calculate L_{3D}.

Thanks for your reply. So I guess I can also use pred_handpose for L{pose} and pred_jointsimg for L{2D}, am I right?

penincillin commented 1 year ago

@zhengsipeng You can use pred_joints_smpl to calculate L_{3D}.

Thanks for your reply. So I guess I can also use pred_handpose for L{pose} and pred_jointsimg for L{2D}, am I right?

Yes.

facebookresearch / frankmocap

Question about training the hand pose. #207