CalciferZh / minimal-hand

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.
MIT License
972 stars 171 forks source link

Keypoint representation as input to IKNet #98

Open jdambre opened 1 year ago

jdambre commented 1 year ago

I am trying to use IKNet separately, starting from hand keypoints that have been extracted with MediaPipe. In order for this to work, I need to make sure that the Mediapipe hand coordinates are preprocessed in order to match the expected input format of IKNet (origin, scale, possibly rotation as well??).

I ran into two questions here: 1) I can see from your code that the keypoints have te be shifted to make 'M1' the origin. Bust what is the assumed scale? In the code you use IK_UNIT_LENGTH when rescaling from Mano reference keypoints, but it is not clear what this relates to or where it comes from. Also, is there an assumption on rotation of the hand (e.g. palm orientation)?

2) I was assuming that the 'mpii_ref' keypoint set you pass as input to the IKNet would be some kind of "relaxed" reference hand (this is converted from the mano code base). When I plot it, however, the projection onto the xz plane matches this assumption, but the y coordinates look very strange, so I am assuming I am doing something wrong in interpreting this. Or maybe this incorporates some assumptions about the IKNet model input that I need to convert also to xyz keypoints input - since this seems to be passed as a reference hand? Could you clarify?

Examples: (1) mpii_ref hand in front view (looking fine)

mpii_ref_hand_xz

(2) mpii_ref hand in rotated xyz view, showing unnaturally curved fingers and very long wrist-to-thumb connection

mpii_ref_hand_xyz

(3) For comparison: mediapipe hand in front view

Mediapipe_hand_xz

(4) For comparison: mediapipe hand in same xyz view as above

Mediapipe_hand_xyz
CalciferZh commented 1 year ago
  1. The unit length is the bone length from M1 to wrist.
  2. I think it's because you are using different scales in xyz axes for visualization. The offset along y axis is 0.2 unit which is roughly 1.8cm. This is reasonable for human hand.

To use the IKNet I think the most safe approach is to only replace xyz and delta and keep other parameters unchanged. Also make sure you have converted the keypoints into mpii format and scale.

jiangfeng999 commented 1 year ago

Sorry to bother you, but did you use mpii_ref_xyz to draw the 3D gesture, and if so, how did you draw the gesture in the coordinate system?

jdambre commented 1 year ago

Hi @jiangfeng999 If you're asking me: I think so, but I gave up on this approach months ago, so I don't remember the details ...