TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.59k stars 306 forks source link

[Proposal] Add Support for Yi-6B and Yi-34B #449

Open neelnanda-io opened 1 year ago

neelnanda-io commented 1 year ago

Proposal

Yi-6B and Yi-34B are new models that make a plausible claim to be the current best open source models, beating Falcon 180B and LLaMA-2 70B on MMLU. It'd be great to support them! I'm particularly keen on the 6B one, it seems like there are cool projects that are easier on the best 6B models around, though it may not add much beyond Mistral 7B

I have not yet read the code so I do not know what architectural quirks it has.

You could use the LLaMA PRs and Mistral PRs as models for what this should look like

https://huggingface.co/01-ai/Yi-6B https://huggingface.co/01-ai/Yi-34B

neelnanda-io commented 1 year ago

I hear claims that it's basically just the LLaMA architecture! This would make this super easy woot. https://huggingface.co/01-ai/Yi-34B/discussions/11

andyrdt commented 10 months ago

Just noting here that Yi models (both 6B and 34B) use grouped-query attention (num_key_value_heads < num_attention_heads). Grouped-query attention is implemented in #443, so this integration should be straightforward once that PR is in.