why not use segmentation information?

Hi I tried this as well and this is a good idea as tried in several other papers:

Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki, NIPS'17. Self-supervised Learning of Motion Capture
Jun Kai Vince Tan, Ignas Budvytis, Roberto Cipolla, BMVC'17. Indirect deep structured learning for 3D human body shape and pose prediction
Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, Kostas Daniilidis, CVPR'18. Learning to Estimate 3D Human Pose and Shape from a Single Color Image
Also: Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, Peter V. Gehler, Unite the People – Closing the Loop Between 3D and 2D Human Representations, CVPR'17. Although this is an optimization based approach.

However, the problem with segmentation loss is that you're making a big assumption that the person is not occluded. But in cluttered natural settings (like COCO) people are often occluded, and the segmentation maps are missing for the occluded parts. Then you can't rely segmentation loss and it becomes a noisy signal. It's fine for images like LSP where it's sports and you mostly see the entire full body, but not for images like COCO. It's an interesting research problem to deal with incomplete segmentation (due to occlusion) as a training signal.

Best,

akanazawa / hmr

why not use segmentation information? #16