Is it possible to merge Mistral 7B and Mistral NeMo 12B?

arcee-ai / mergekit

Tools for merging pretrained large language models.

GNU Lesser General Public License v3.0

4.86k stars 444 forks source link

Is it possible to merge Mistral 7B and Mistral NeMo 12B? #407

Open azulika opened 3 months ago

azulika commented 3 months ago

They use the same "MistralForCausalLM" structure and seem to share some parameters such as intermediate_size, and I was wondering if it would be possible to merge them together.

jim-plus commented 3 months ago

The tokenizers and vocabulary size are radically different between the models (assuming Mistral v0.3 7B), as is hidden size. I would be surprised if the result is coherent.