facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
676 stars 92 forks source link

Missing Gestures #67

Open popserv opened 3 years ago

popserv commented 3 years ago

Hi, I've downloaded both 5fps/30fps dataset and mainly checked InterHand ROM sequences, then found out the gestures "evil thinker" and "single relaxed finger" (shown on Fig12 and suppl. on your paper) seem not to be available. Can I confirm these gestures are not included on your current dataset & could you explain the reason?

mks0601 commented 3 years ago

Hi,

"evil thinker" and "single relaxed finger" are included in ROM sequences (see the caption of Fig. 12). The ROM sequences are captured from subjects doing several hand sequences. Therefore, there would be no folder named "evil thinker" and "single relaxed finger"; however, several ROM folders would contain those sequences. Fig. 12 shows some instructions that we want the subjects to follow during the ROM capture.

popserv commented 3 years ago

Thanks for your prompt reply.

I see, still cannot find "evil thinker" but in "0390_dh_touchROM" folders, I could see some gestures partly similar to the "single relaxed finger" description:

(1) touch each fingertip to the center of the palm for the same hand, do this for both hands, (2) interlock fingers and press palms out, (3) with the opposite hand, hold wrist, (4) with the opposite hand, bend wrist down and back, (5) point at watch on both wrists, (6) circle wrists, (7) look at nails, and (8) point at yourself with thumbs then with index fingers."

About these nontouch/touchROM image folders, only the folders Capture0-9 are in 30fps video & others (Capture10-26) come with some subsampled frames, is it correct? I've checked the CHECKSUM by verify_download.sh and I suppose I've downloaded/unzipped successfully.

mks0601 commented 3 years ago

Oh.. it seems Capture10-26 in train folder are somehow subsampled. There were many issues when we released the dataset. For example, we should remove faces from images or even remove images if there can be some privacy leak issues. Let me check this issue with other authors and let you know. I think other sequences in test and val are 30fps.

mks0601 commented 3 years ago

I have confirmed that train/Capture10-26 and test/Capture2-7 contain frames less than 5 fps due to the multiple rounds of image dumping procedure :( Sorry for this. I will write this at README.

popserv commented 3 years ago

Thanks, I see. Also about the folders below, all of them (capture0-26) come with very small number of subsampled frames. Note that I've only checked the 0200-0300s folders (assuming 0000-0100s are single-hand sequences).

mks0601 commented 3 years ago

Yes. I think that is also due to the above issue (multiple rounds of non-overlapped image dumps).

popserv commented 3 years ago

Thanks, will there be a fix/update for that in the future (like InterHand v1.x)?

mks0601 commented 3 years ago

I guess it would be not easy.. Exporting data from companies like FB requires so many things :( There are some rules we should follow to prevent privacy leak issue (e.g., face and fingerprints). But we'll try!

ZhengdiYu commented 1 year ago

@mks0601 Hi, I have a question: I think the PP sequences are easy to find by their corresponding sequence name (e.g. right clasp left -> 0259_dh_rightclaspleft), but I couldn't find some of the ROM sequences in the training set mentioned in your paper:

image

such as 'finger walk' and 'five countdown'. However, in the validation set, 'ROM03_LT_No_Occlusion' seems to include'five count'. Could I assume that ROM sequences have no specific correspondences like the PP sequences? Some of these ROM sequences can only be found in folders like 'ROM03_LT_No_Occlusion', where the folder has no meaningful action label since it's a mix of several motions.

And the only way to get a 'five countdown' sequence is to manually check and select the frames from a folder like 'ROM03_LT_No_Occlusion', is it correct?

  1. Also, I couldn't find PP sequence 'palm up' in the dataset. There is only 0035_palmdown. However, this sequence seems actually doing palm up instead of palm down: image

Thank you very much!