bobwan1995 / PMFNet

Implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral)
MIT License
88 stars 14 forks source link

Question about Zoom-In Module #7

Closed DongSky closed 4 years ago

DongSky commented 4 years ago

Hi Thanks for your great work. I have a quetion about human body part in Zoom-In Module. In HICO-DET dataset, some persons in training figures have only a few body parts, for example, HICO_train2015_00037377.jpg, which results in the pose estimator could not extract human pose in this figure. However, as you mentioned in section 4.2, we should crop all human parts based on human pose output. Therefore I want to know how to solve this problem?

bobwan1995 commented 4 years ago

Good question! Actually in this work we don't take into account the body occlusion cases and we will zoom into human part regions according to the estimated human key-points. However, we adopt an attention mechanism in Zoom-in module, which can hopefully suppress the irrelevant human part features in HOIs prediction.

DongSky commented 4 years ago

Sincerely thanks for your help! Now I almost understand how zoom in module works. But there is still a little problem. As I mentioned above, HICO-DET datasets has some hard cases (only a few body parts visible). Some of them are traditional occlusion case, for example, image case 1, and you have given an answer for these cases(Thanks for your support again). And others could be concluded as "only a few parts in figures, other parts are not in this figure", for example, image case 2. You mean for case 2, we can also use detector's output (for example, I got all (0, 0) keypoints using OpenPose) to exteact pose features and train zoom in module using these features and semantic information from attention mechanism? Case 1, Occlusion HICO_train2015_00022501

Case 2, out of sight HICO_train2015_00037377

DongSky commented 4 years ago

Sorry I forgot to caption the id for these two figures, case 1 is HICO_train2015_00022501, case 2 is HICO_train2015_00037377.

bobwan1995 commented 4 years ago

The short answer is yes, we trust the results from pose estimator. It's obvious that the predicted poses make nonsense in your second case, but we don't deal with this issue in this work. I hope it could be solved in future works. Thanks for your attention on this work!

DongSky commented 4 years ago

Oh I see. Thanks for your support!