size mismatch for lm.lm_head.weight: copying a param with shape torch.Size([50400, 4096]) from checkpoint, the shape in current model is torch.Size([50258, 4096]).

Aleph-Alpha / magma

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com

MIT License

469 stars 55 forks source link

size mismatch for lm.lm_head.weight: copying a param with shape torch.Size([50400, 4096]) from checkpoint, the shape in current model is torch.Size([50258, 4096]). #31

Closed UCCME closed 2 years ago

UCCME commented 2 years ago

size mismatch for lm.lm_head.weight: copying a param with shape torch.Size([50400, 4096]) from checkpoint, the shape in current model is torch.Size([50258, 4096]). this question is very strange. I didn't change any code, and I found that the model and the config have some mismatch. Does anyone meet the same question?

stefan-it commented 2 years ago

Hi @UCCME ,

please have a look at #27. You definitely need to install forked/modified version of transformers, from: https://github.com/finetuneanon/transformers