Open gsrujana opened 2 years ago
I think the model itself is top down, they use a detector for multi person pose estimation, no?
The VitPose+ paper explains that it can be top-down or bottom-up and explains how the bottom up method works: https://arxiv.org/pdf/2212.04246.pdf
Hi, can someone explain how the bottom up Vitpose model work? Can you give an example with VITpose_B. I am instrested in the smallest, fastest single person pose model among all while preserving decent accuracy on COCO. Would it be ViTpose_b in bottom up or top down manner?