Handling missing values in 2D due to occlusions and other factors

garyzhao / SemGCN

The Pytorch implementation for "Semantic Graph Convolutional Networks for 3D Human Pose Regression" (CVPR 2019).

https://arxiv.org/abs/1904.03345

Apache License 2.0

467 stars 78 forks source link

Handling missing values in 2D due to occlusions and other factors #23

Open saichanda opened 4 years ago

saichanda commented 4 years ago

Hi @garyzhao , Thank you for the repo. If we have missing keypoints in the 2D predictions, due to occlusions by another object or only partial body being visible, Is there any provision for handling this issue in SemGCN? suppose the missing keypoints are addressed as '-1' or '0' in the 2D keypoints list, how is your model going to handle that? We find that the predictions are bad when we have missing values (in terms of occlusions). Thank you.

garyzhao commented 4 years ago

Hi @saichanda ,

The current version of SemGCN cannot handle occlusions.

One potential solution might use some masks to impose 2D occlusions during network training.

Best, Long

saichanda commented 4 years ago

@garyzhao , Thank you for the response. Thanks for the solution. But I'm curious to know, from the paper, it is mentioned that the occlusions are handled.

we improve previous methods by a large margin for the action of directions, taking photo, posing, sitting down, walking dog and walking together. We hypothesize that this is due to the severe self-occlusions in these actions, while they can be effectively encoded by our SemGCN using relations within graphs.

Can you elaborate on what severe occlusions SemGCN is effectively encoding, if you say that the current version of SemGCN cannot handle occlusions. Thank you.

saichanda commented 4 years ago

Sorry, Closed the issue by mistake.

garyzhao commented 4 years ago

Hi @saichanda ,

Never mind.

That's a good question.

The "occlusions" you mentioned here ('-1' or '0' in the 2D keypoints) are extreme cases that one or more 2D joints are totally "vanished" in the 2D output, which cannot be handled by us.

In our paper, we expected that the 2D detector can still make some reasonable guesses when there are occlusions, which means the 2D output might not be accurate but close to the ground truth (reasonable). In this case, our method could refine the 3D prediction.

Therefore, to handle your occlusions, I suggest that you can add some masks (which randomly drop some 2D outputs just like your case) during training, which might improve the performance.

Best, Long

saichanda commented 4 years ago

Sure @garyzhao , Thank you for the time and support.

best regards, Sai