Why is resize_token_embeddings not needed for MPTForCausalLM here?

Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

https://otter-ntu.github.io/

MIT License

3.56k stars 241 forks source link

Why is resize_token_embeddings not needed for MPTForCausalLM here? #200

Closed gray311 closed 1 year ago

gray311 commented 1 year ago

ZhangYuanhan-AI commented 1 year ago

The input_embedding size of MPT-7B model is 50432, but the dimension of the MPT's tokenizer (gpt-neox-20b) is 50277, which means that they are not matched. The reason behind this is training efficiency: https://twitter.com/karpathy/status/1621578354024677377?s=46.
Following this setting, we also use 50432 as the input_embedding dimension, and actually there are only 50277( 50281 if you add "\ \ \ \") tokens are “valid”

gray311 commented 1 year ago

thanks for your response, I have learned this trick!

ZhangYuanhan-AI commented 1 year ago

Enjoy!