[New Feature] Is Mixtral supported?

OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Apache License 2.0

8.21k stars 818 forks source link

Thanks for your interest in LMFlow! We have tested Mixtral-8x7B in A40 (48G)*8 servers, so the dense training of mixtral-8x7B is currently supported in LMFlow. Sparse training is still under implementation, which we will add to our roadmap and schedule the implementation soon. Multi-node (https://github.com/OptimalScale/LMFlow/blob/main/readme/multi_node.md) can be utilized for larger model training such as Mixtral-8x22B, but we haven't yet tested models that large.

Hope this information can be helpful 😄

OptimalScale / LMFlow

[New Feature] Is Mixtral supported? #879