vitpose and pretrained backbone

geopavlakos / hamer

HaMeR: Reconstructing Hands in 3D with Transformers

https://geopavlakos.github.io/hamer/

MIT License

326 stars 28 forks source link

vitpose and pretrained backbone #34

Closed xungeer29 closed 5 months ago

xungeer29 commented 5 months ago

Thanks for your great work. I want to train hamer with small parameter sizes, such as change hamer's backbone to vit-small. But I don't have the pretrained backbone, small hamer's performance is very poor. I noticed that the pretrained backbone and vitpose-huge-wholebody you provided is slightly different from the official vitpose repo https://github.com/ViTAE-Transformer/ViTPose. Can you provide the conversion method for the pretrained backbone network? In other words, how can we obtain pretrained backbone of vitpose-small, vitpose-base and vitpose-large?

geopavlakos commented 5 months ago

The pretrained backbone is the same as the one provided by ViTPose (it should be available through this link). The only change we did was removing the backbone. prefix for each dictionary key, but the weights should be identical. You can similarly try with different backbones (although the code currently only defines the network architecture for ViT-H, so you probably need to do more edits there too).

xungeer29 commented 5 months ago

The pretrained backbone is the same as the one provided by ViTPose (it should be available through this link). The only change we did was removing the backbone. prefix for each dictionary key, but the weights should be identical. You can similarly try with different backbones (although the code currently only defines the network architecture for ViT-H, so you probably need to do more edits there too).

Thanks for your reply, I got it. But I also want to know the official download address for this file(_DATA/vitpose_ckpts/vitpose+_huge/wholebody.pth). The model you provided seems to be slightly different from the one downloaded from VitPose(https://github.com/ViTAE-Transformer/ViTPose) vitpose-huge OneDrive

geopavlakos commented 5 months ago

We follow instructions by the authors to split the model they provide into different versions (i.e., for wholebody pose, for animal pose, etc) and keep the wholebody component. If you follow the model splitting procedure, you should get an identical model with the one we provide.

Please note that we only use the wholebody model for the demo code, to get an estimate of the hand location in the image. We do not train/finetune that version.

xungeer29 commented 5 months ago

We follow instructions by the authors to split the model they provide into different versions (i.e., for wholebody pose, for animal pose, etc) and keep the wholebody component. If you follow the model splitting procedure, you should get an identical model with the one we provide.

Please note that we only use the wholebody model for the demo code, to get an estimate of the hand location in the image. We do not train/finetune that version.

Thank you for your prompt reply and it solves my problem. I wish you make greater progress in your future research.