OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.21k stars 818 forks source link

[New Feature] Is Mixtral supported? #879

Open markusdr opened 2 months ago

markusdr commented 2 months ago

Can you confirm if Mixtral is currently supported, e.g., mistralai/Mixtral-8x7B-Instruct-v0.1? I saw in another issue that Mistral is supported, but I'm not sure about Mixtral-8x7B since it's a different architecture.

research4pan commented 2 months ago

Thanks for your interest in LMFlow! We have tested Mixtral-8x7B in A40 (48G)*8 servers, so the dense training of mixtral-8x7B is currently supported in LMFlow. Sparse training is still under implementation, which we will add to our roadmap and schedule the implementation soon. Multi-node (https://github.com/OptimalScale/LMFlow/blob/main/readme/multi_node.md) can be utilized for larger model training such as Mixtral-8x22B, but we haven't yet tested models that large.

Hope this information can be helpful 😄