Can you elaborate the intuition behind "parse_poses" and "get_root_relative_poses" functions in parse_poses.py?

Daniil-Osokin / lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU.

Apache License 2.0

653 stars 137 forks source link

Can you elaborate the intuition behind "parse_poses" and "get_root_relative_poses" functions in parse_poses.py? #79

Closed nessessence closed 2 years ago

nessessence commented 2 years ago

Can you elaborate the intuition behind "parse_poses" and "get_root_relative_poses" functions in parse_poses.py ? For example, why do we need to "read all pose coordinates at neck location" and "refine keypoints coordinates at corresponding limbs locations"

and also why "features" (inference_results[0]) has different shape sometime. ( for example feature.shape = (57, 32, 12) or (57, 32, 9) or (57, 32, X), but most of them are (57, 32, 12)

Thanks in advance.

Daniil-Osokin commented 2 years ago

Hi! You can find the details in the paper, in short this adds robustness to pose prediction: neck is usually visible, so it is ok to encode other keypoints coordinates at the neck location. If other keypoints are also visible, then their coordinates can be substituted by coordiantes at their own location. Regarding the second question, I believe shapes should be always the same, possibly there is a bug somewhere.

Daniil-Osokin commented 2 years ago

Hope, it helped.