GeneZC / MiniMA

Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
Apache License 2.0
91 stars 5 forks source link

Code for Training MiniMoE #4

Open ojus1 opened 6 months ago

ojus1 commented 6 months ago

Can you please release code for "upcycling" LLMs to make MoEs? I have a use-case for multi-lingual LLMs where this would be incredibly helpful!

GeneZC commented 6 months ago

We may update the code later after the intermediate checkpoints of our MoE model are verified to be effective.