qwen2.0用mcore跑的时候，有两个问题

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Apache License 2.0

674 stars 94 forks source link

qwen2.0用mcore跑的时候，有两个问题 #256

Closed 154912369 closed 3 months ago

154912369 commented 3 months ago

第一个，mlp部分的第一个线性部分用TELayerNormColumnParallelLinear会加上rmsnorm，但实际上qwen2.0在这边没有加norm。第二个，pre_mlp_layernorm需要使用TENorm，而不是IdentityOp。以上是基于Qwen2-7B-Instruct跑的结论，建议兼容下。似乎这是之前的Qwen1.5-32b的代码？

jerryli1981 commented 3 months ago

对的，qwen2的接入工作目前正在进行中，预计本周日前可以完成

jerryli1981 commented 3 months ago

qwen2的dense模型接好了，烦请评测：https://github.com/alibaba/Pai-Megatron-Patch/pull/258