alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

请问 mixtral的finetune样例依赖的是哪个版本的Megatron-LM #189

Closed cryoco closed 5 months ago

cryoco commented 5 months ago

使用0126和0405两个版本都会出现依赖问题

cryoco commented 5 months ago

例如:

其他地方也有类似的冲突问题,所以有些疑惑,测试mixtral微调依赖的Megatron-LM是公版的还是内部维护的呢?

jerryli1981 commented 5 months ago

您好,Mixtral的Quick Start和ReadMe已更新,烦请pull下最新的代码:https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core-MoE%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E6%B5%81%E7%A8%8B

jerryli1981 commented 5 months ago

例如:

  • finetune脚本这里的import方式对应的是Megatron core_r0.6.0之前的目录结构(r0.6.0大改了目录结构,megatron下面只包含core、training和inference了)
  • 而同一个文件中这里core_transformer_config_from_args函数的传参(有两个参数)对应的则是Megatron core_r0.6.0之后arguments.py,此前都只有一个参数,如core_r0.5.0的arguments.py

其他地方也有类似的冲突问题,所以有些疑惑,测试mixtral微调依赖的Megatron-LM是公版的还是内部维护的呢?

所有Megatron-LM都是公版,只有MegaBlocks维护的是内部版本

cryoco commented 5 months ago

您好,Mixtral的Quick Start和ReadMe已更新,烦请pull下最新的代码:https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core-MoE%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E6%B5%81%E7%A8%8B

感谢回复,我尝试下

cryoco commented 5 months ago

您好,Mixtral的Quick Start和ReadMe已更新,烦请pull下最新的代码:https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core-MoE%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E6%B5%81%E7%A8%8B

试验了可以运行,非常感谢。

这里的SP可能需要改成true,因为MOE要求必须打开SP