alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

deepseek-v2实现的有问题,不能支持tp>1的情况 #255

Closed 154912369 closed 3 months ago

154912369 commented 3 months ago

感谢大佬支持的deepseek-v2的megatron实现! 然后,我说一下我跑实验时候的问题,在做线性变化的时候需要使用了ColumnParallelLinear,这个类没法支持context parallel,这导致没法跑236B版本的deepseek-v2-chat。

jerryli1981 commented 3 months ago

您好,支持TP>1情况下的sequence parallel的。或者进钉钉群加我详细聊下?

jerryli1981 commented 3 months ago

deepseek-v2-lite模型是能够支持TP>1且loss收敛和TP=1是一样的,所以您的issue标题的表述我不是很理解,还是进群直接at我沟通吧?

jerryli1981 commented 3 months ago

hi, 您好,进钉钉群了嘛?或者直接加我微信:jerryli1981,咱们对齐下技术细节哈谢谢!