alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

MegaBlocks训练 #232

Closed zhanjiqing closed 3 months ago

zhanjiqing commented 4 months ago

@jerryli1981 您好,感谢在Pai上的工作。看您最近引入了MegaBlocks来进行MoE训练,不知道有没有对比过和MCore的速度对比情况?我在使用的过程遇到了几个问题,不知道您有没有好的建议:

jerryli1981 commented 3 months ago

您好,感谢关注。我们最新把最新的MoE模型比如DeepSeek-V2和Qwen2的MoE都进行了Mcore的实现,建议您先试试Mcore的实现哈