Map between body keypoints of persons from annotations and predictions of body keypoints from code, when there are more than two persons

Hi Sana,

Thank you very much for the issue. Unfortunately I won't have time to get back to the system in a few weeks, but I hope I can help you anyway.

Are you using the Python COCO API for evaluation? https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py Did you check the COCO paper? https://arxiv.org/pdf/1405.0312.pdf

In section 4.3, it says: "For the purpose of evaluation, areas marked as crowds will be ignored and not affect a detector’s score".

They talk about segmentation, but my guess without looking at the code in much detail is that the keypoint metrics also ignore the crowds, and anything above ~15 people is simply masked out for practical reasons. This means, it is not bad that your system finds the extra people, but only the annotated will count towards the evaluation. If you aren't familiar with the masks I would encourage you to go through the COCO webpage or the papers to see how it rolls.

Could you send a screenshot of the corresponding mask? Something like in the first image here: https://aferro.dynu.net/work/human_pose_estimation/ this way we could confirm this guess. Alternative explanations would require a more careful look at the code and COCO paper

andres-fr / realtime-pose-estimation

Map between body keypoints of persons from annotations and predictions of body keypoints from code, when there are more than two persons #2