The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Hello, I was wondering, what is the model which is on Web Demo for video in HuggingFace? I would like to test that using scripts. Are weights provided for that particular model? Thanks
Also, is the model's accuracy on OCHuman coming from the VIT architecture or the fact that it has been trained on combination of MPII + Crowdpose +COCo datasets?
Hello, I was wondering, what is the model which is on Web Demo for video in HuggingFace? I would like to test that using scripts. Are weights provided for that particular model? Thanks