arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.79k stars 434 forks source link

Adding an expert to a pre-trained Mixtral model #171

Open maxime-dlabai opened 8 months ago

maxime-dlabai commented 8 months ago

I would like to know if it is possible to add an additional expert (an additional model) to an existing pre-trained Mixtral MoE architecture, without having to re-create the entire MoE architecture from scratch.

I have an existing pre-trained Mixtral model and I would like to expand it by adding a new expert focused on a particular domain or task.

Is there a way to add the new expert model to the existing router and save it just the new expert and router, while keeping the previously trained experts intact? Or does the full MoE architecture need to be recreated anytime a new expert is added?

If there are any code examples or documentation on how to properly add an expert to an existing pre-trained Mixtral model, please point me to them. I haven't been able to find clear information on the right way to do this. Any guidance would be appreciated!

dronesflier commented 8 months ago

How would the router learn when and how to pick the new expert? I don't think you can avoid additional pretraining (or the hack). Please do tell me if I'm wrong though!