I would like to know if it is possible to add an additional expert (an additional model) to an existing pre-trained Mixtral MoE architecture, without having to re-create the entire MoE architecture from scratch.
I have an existing pre-trained Mixtral model and I would like to expand it by adding a new expert focused on a particular domain or task.
Is there a way to add the new expert model to the existing router and save it just the new expert and router, while keeping the previously trained experts intact? Or does the full MoE architecture need to be recreated anytime a new expert is added?
If there are any code examples or documentation on how to properly add an expert to an existing pre-trained Mixtral model, please point me to them. I haven't been able to find clear information on the right way to do this. Any guidance would be appreciated!
How would the router learn when and how to pick the new expert? I don't think you can avoid additional pretraining (or the hack). Please do tell me if I'm wrong though!
I would like to know if it is possible to add an additional expert (an additional model) to an existing pre-trained Mixtral MoE architecture, without having to re-create the entire MoE architecture from scratch.
I have an existing pre-trained Mixtral model and I would like to expand it by adding a new expert focused on a particular domain or task.
Is there a way to add the new expert model to the existing router and save it just the new expert and router, while keeping the previously trained experts intact? Or does the full MoE architecture need to be recreated anytime a new expert is added?
If there are any code examples or documentation on how to properly add an expert to an existing pre-trained Mixtral model, please point me to them. I haven't been able to find clear information on the right way to do this. Any guidance would be appreciated!