FlyEgle / MAE-pytorch

Masked Autoencoders Are Scalable Vision Learners
245 stars 36 forks source link

load_state_dict, size mismatch #7

Closed ichiyasa0308 closed 2 years ago

ichiyasa0308 commented 2 years ago

Hi I’m not good at pytorch modeling, but how can I download the pretrained weight for inference.py?

I downloaded pretrained weight as you mentioned below, but I can’t load correctly.

Vit-Tiny/16 pretrain models is here Vit-Base/16 pretrain models is here

error message is

load_state_dict, size mismatch for cls_token:copying a param with shape torch.Size([1,1,192)] from check point, the shape in current model is torch.Size([1,1,768]).

It seems that /model/Transformers/VIT/mae.py line around 200 is incorrect. thank you for reading.

FlyEgle commented 2 years ago

Because the weights is vit-tiny,and you want load this weights to vit-base. You can see the readme model config, follow this to modify the vit model dims. 1640853624(1)