Closed XingyuXie closed 11 months ago
Hi! MegaBlocks only defines the MoE layers so you'll have to use another framework for the remainder of the model. We use Megatron-LM for our experiments but I am not sure they support Mixtral 8x7B yet. You could also try HuggingFace, which I suspect would be easier than Megatron-LM.
Really thanks for the quick response. I also prefer and am familiar with the Megatron framework.
Is it convenient to share an example of Multi-node training based on the Megatron-LM for a 7B MoE model? It's okay to omit some important hyperparameters.
The multi-node example is important for our group. We will definitely cite and acknowledge you in our research.
We have some trainings scripts under exp/dmoe that you should be able to adapt pretty easily! Just change the distributed arguments to set up for the number of nodes you want to run on.
I will try! Thanks a lot!
Thanks for the excellent work.
I really try to fine-tune the mistral 8x7b model based on the codebase. It is really convenient to share a launch script for multi-node cases. By the way, how could I load the 8x7b weight? It seems that there is no weight conversion script.