🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
The input_embedding size of MPT-7B model is 50432, but the dimension of the MPT's tokenizer (gpt-neox-20b) is 50277, which means that they are not matched. The reason behind this is training efficiency: https://twitter.com/karpathy/status/1621578354024677377?s=46.
Following this setting, we also use 50432 as the input_embedding dimension, and actually there are only 50277( 50281 if you add "\ \ \ \") tokens are “valid”