Interpretation of the output - Githubissues

Daniil-Osokin / lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU.

Apache License 2.0

660 stars 138 forks source link

Interpretation of the output #15

Closed yushuinanrong closed 4 years ago

yushuinanrong commented 4 years ago

Hi,

First of all, great work! I've managed to run the demo code with OpenVINO, and it can runs at FPS~=20 on single CPU, which is fantastic! I have a question on the output format. For example, one 2D pose estimation is a 1x58 vector, how to interpret it? I think COCO format has 18 joints, so why isn't the 2D pose estimation 1x54?

Daniil-Osokin commented 4 years ago

Hi! Here is the pose parsing code. As you see, there are actually 18 keypoints found for 2D pose (17 from COCO + neck). They are remapped to Panoptic pose format, which has 19 keypoints (it also has pelvis, which remains -1 in current implementation). So 19 * 3 = 57 and the last one is for pose confidence value.

yushuinanrong commented 4 years ago

Hi @Daniil-Osokin, Thanks for your quick response and I appreciate your explanation.

Regards, Melo

yushuinanrong commented 4 years ago

@Daniil-Osokin Could you kindly point me to anywhere storing the correspondence of keypoint index and joint name? For OpenPose results, I believe those correspondences are: Joint index: {0, "Nose"} {1, "Neck"}, {2, "RShoulder"}, {3, "RElbow"}, {4, "RWrist"}, {5, "LShoulder"}, {6, "LElbow"}, {7, "LWrist"}, {8, "RHip"}, {9, "RKnee"}, {10, "RAnkle"}, {11, "LHip"}, {12, "LKnee"}, {13, "LAnkle"}, {14, "REye"}, {15, "LEye"}, {16, "REar"}, {17, "LEar"},

Daniil-Osokin commented 4 years ago

This one corresponds to raw 2D order (before remapping), here you are: 2D output keypoints order, 3D keypoints order.

yushuinanrong commented 4 years ago

@Daniil-Osokin Thanks for your reply. I'm a bit confused about the output of the function 'parse_poses()' which produces 'poses_3d' and 'poses_2d'. Both 2d and 3d poses have 19 coordinates, where the 2d-pose always has (-1,-1) for the third coordinate, which supposedly corresponds to 'body center'. It seems to me 2d and 3d poses from 'parse_poses()' follow the same index-joint correspondences which are: 0: Neck 1: Nose 2: BodyCenter (center of hips) 3: lShoulder 4: lElbow 5: lWrist, 6: lHip 7: lKnee 8: lAnkle 9: rShoulder 10: rElbow 11: rWrist 12: rHip 13: rKnee 14: rAnkle 15: rEye 16: lEye 17: rEar 18: lEar Am I correct?

Daniil-Osokin commented 4 years ago

Yes, you are correct. I mention two different keypoints orders, because the first part of the pipeline, which detects 2D keypoints, uses 2D keypoints order. So raw 2D keypoints, which returned by extract_poses, has one order, and after coordinates remapping, both 2D and 3D poses have 3D keypoints order.

yushuinanrong commented 4 years ago

@Daniil-Osokin Great! Thanks for the clarification.