Yi-6B and Yi-34B are new models that make a plausible claim to be the current best open source models, beating Falcon 180B and LLaMA-2 70B on MMLU. It'd be great to support them! I'm particularly keen on the 6B one, it seems like there are cool projects that are easier on the best 6B models around, though it may not add much beyond Mistral 7B
I have not yet read the code so I do not know what architectural quirks it has.
You could use the LLaMA PRs and Mistral PRs as models for what this should look like
Just noting here that Yi models (both 6B and 34B) use grouped-query attention (num_key_value_heads < num_attention_heads). Grouped-query attention is implemented in #443, so this integration should be straightforward once that PR is in.
Proposal
Yi-6B and Yi-34B are new models that make a plausible claim to be the current best open source models, beating Falcon 180B and LLaMA-2 70B on MMLU. It'd be great to support them! I'm particularly keen on the 6B one, it seems like there are cool projects that are easier on the best 6B models around, though it may not add much beyond Mistral 7B
I have not yet read the code so I do not know what architectural quirks it has.
You could use the LLaMA PRs and Mistral PRs as models for what this should look like
https://huggingface.co/01-ai/Yi-6B https://huggingface.co/01-ai/Yi-34B