JunkyByte / easy_ViTPose

Easy and fast 2d human and animal multi pose estimation using SOTA ViTPose [Y. Xu et al., 2022] Real-time performances and multiple skeletons supported.
Apache License 2.0
134 stars 20 forks source link

Evaluation on coco dataset #33

Closed omkaar718 closed 2 months ago

omkaar718 commented 3 months ago

The results of using this implementation on coco val dataset seem to be quite lower than those reported in the paper.

JunkyByte commented 3 months ago

Hello! Thank you for your test, this started as a fork of https://github.com/jaehyunnn/ViTPose_pytorch just to improve the inference pipeline, can you try checking with that implementation if you obtain similar results?

Also if you don't mind to share the code you use for eval, I won't have the time in the next couple weeks but I could do some tests.

Also can you check the map you get with the detector or try to run with groundtruth bbox? They report "Using detection results from a detector that obtains 56 mAP on person"

Thanks

JunkyByte commented 3 months ago

Hi I did some checks but I cannot give you an answer. I found that yolov8 had problems on MPS, if by any chance you are running on mac the evaluation. Updating the Ultralytics package solves the problem (I updated the requirements)

omkaar718 commented 3 months ago

@JunkyByte Thank you for your response! I have opened a PR (https://github.com/JunkyByte/easy_ViTPose/pull/34) for COCO evaluation code. Readme has been updated with instructions to use the evaluation code.

omkaar718 commented 3 months ago

@JunkyByte I found person detection results here provided in the official implementation: https://github.com/ViTAE-Transformer/ViTPose/blob/main/docs/en/tasks/2d_body_keypoint.md#:~:text=Please%20download%20from%20OneDrive%20or%20GoogleDrive from the official implementation's readme. Not sure if these were the exact ones used by them, but the results have drastically improved and are close to those obtained using the official implementation.

mAP@0.5:0.95, detector threshold = 0.5 to filter out low confidence detection bboxes:

Therefore, the bbox detections resulting from yolov8 could be the main reason behind low scores in this pipeline.

JunkyByte commented 3 months ago

@omkaar718 thank you very much for inspecting this. I'm busy these days but I checked your PR and I will eventually merge it in the next few days, so thanks again.

Applying the models to videos I see qualitatively good results, it might be that indeed yolo does not work well for the coco val images.

I will get back to you :) have a nice day!