databricks / megablocks

Apache License 2.0
1.22k stars 175 forks source link

About the Multi-node Script #59

Closed XingyuXie closed 11 months ago

XingyuXie commented 11 months ago

Thanks for the excellent work.

I really try to fine-tune the mistral 8x7b model based on the codebase. It is really convenient to share a launch script for multi-node cases. By the way, how could I load the 8x7b weight? It seems that there is no weight conversion script.

tgale96 commented 11 months ago

Hi! MegaBlocks only defines the MoE layers so you'll have to use another framework for the remainder of the model. We use Megatron-LM for our experiments but I am not sure they support Mixtral 8x7B yet. You could also try HuggingFace, which I suspect would be easier than Megatron-LM.

XingyuXie commented 11 months ago

Really thanks for the quick response. I also prefer and am familiar with the Megatron framework.

Is it convenient to share an example of Multi-node training based on the Megatron-LM for a 7B MoE model? It's okay to omit some important hyperparameters.

The multi-node example is important for our group. We will definitely cite and acknowledge you in our research.

tgale96 commented 11 months ago

We have some trainings scripts under exp/dmoe that you should be able to adapt pretty easily! Just change the distributed arguments to set up for the number of nodes you want to run on.

XingyuXie commented 11 months ago

I will try! Thanks a lot!