Closed DongSky closed 4 years ago
Good question! Actually in this work we don't take into account the body occlusion cases and we will zoom into human part regions according to the estimated human key-points. However, we adopt an attention mechanism in Zoom-in module, which can hopefully suppress the irrelevant human part features in HOIs prediction.
Sincerely thanks for your help! Now I almost understand how zoom in module works. But there is still a little problem. As I mentioned above, HICO-DET datasets has some hard cases (only a few body parts visible). Some of them are traditional occlusion case, for example, image case 1, and you have given an answer for these cases(Thanks for your support again). And others could be concluded as "only a few parts in figures, other parts are not in this figure", for example, image case 2. You mean for case 2, we can also use detector's output (for example, I got all (0, 0) keypoints using OpenPose) to exteact pose features and train zoom in module using these features and semantic information from attention mechanism? Case 1, Occlusion
Case 2, out of sight
Sorry I forgot to caption the id for these two figures, case 1 is HICO_train2015_00022501, case 2 is HICO_train2015_00037377.
The short answer is yes, we trust the results from pose estimator. It's obvious that the predicted poses make nonsense in your second case, but we don't deal with this issue in this work. I hope it could be solved in future works. Thanks for your attention on this work!
Oh I see. Thanks for your support!
Hi Thanks for your great work. I have a quetion about human body part in Zoom-In Module. In HICO-DET dataset, some persons in training figures have only a few body parts, for example, HICO_train2015_00037377.jpg, which results in the pose estimator could not extract human pose in this figure. However, as you mentioned in section 4.2, we should crop all human parts based on human pose output. Therefore I want to know how to solve this problem?