ViTAE-Transformer / ViTPose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Apache License 2.0
1.37k stars 186 forks source link

Bottom up vs top down model #32

Open gsrujana opened 2 years ago

gsrujana commented 2 years ago

Hi, can someone explain how the bottom up Vitpose model work? Can you give an example with VITpose_B. I am instrested in the smallest, fastest single person pose model among all while preserving decent accuracy on COCO. Would it be ViTpose_b in bottom up or top down manner?

gpastal24 commented 2 years ago

I think the model itself is top down, they use a detector for multi person pose estimation, no?

oliverdain commented 1 year ago

The VitPose+ paper explains that it can be top-down or bottom-up and explains how the bottom up method works: https://arxiv.org/pdf/2212.04246.pdf