Mismatching LM shape between 50400 (pre-trainined pt) and 50258 (gpt-2)

Aleph-Alpha / magma

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com

MIT License

469 stars 55 forks source link

Mismatching LM shape between 50400 (pre-trainined pt) and 50258 (gpt-2) #27

Closed tsujuifu closed 2 years ago

tsujuifu commented 2 years ago

Thanks for this wonderful work 😍
I load mp_rank_00_model_states.pt, but it shows that the shape of LM is different:

size mismatch for lm.lm_head.weight: copying a param with shape torch.Size([50400, 4096]) from checkpoint, the shape in current model is torch.Size([50258, 4096])

I guess it is because of the resize_token_embeddings here.
I also tried to truncate the additional dimension,

sd["lm.lm_head.weight"] = sd["lm.lm_head.weight"][:50258, :]
sd["lm.lm_head.bias"] = sd["lm.lm_head.bias"][:50258]

but the result of example_inference.py seems weird 😂

bondankeNM Drama fixtures Sergey
Fantasticheddar AUTHOR hob sealedunction

Super thanks for the help!

CoEich commented 2 years ago

Hi,

thx for the kind words. Can you check what HF version (transformers) you have installed? Make sure to use the one specified in the requirements. If you already had HF installed in your environment you might have the wrong version.

Best, Constantin

stefan-it commented 2 years ago

@tsujuifu I think this is related due to this modification from the finetuneanon forked Transformers repo

https://github.com/finetuneanon/transformers#gpt-j-6b (they re-sized it to 54.000 vocab). So pip3 show transformers should point to the finetuneanon Transformers fork, instead of the upstream repo in your environment :)

tsujuifu commented 2 years ago

That is the point! We should use finetuneanon/transformers, and it works well now.

Appreciate the kind reply 😍