alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

qwen/hf2mcore_1.5_v2.py将hf转为mcore格式,当tp大于1时会报错 #167

Closed cdj0311 closed 5 months ago

cdj0311 commented 5 months ago

如题,使用7B,设置tp=2,pp=1,转换时报如下错: File "/ossfs/workspace/megatron-moe/Pai-Megatron-Patch-0408-train/toolkits/model_checkpoints_convertor/qwen/hf2mcore_1.5_v2.py", line 461, in save_mgmodel viewed = v.view(args.num_query_groups, -1, head_dim, args.hidden_size) RuntimeError: shape '[32, -1, 128, 4096]' is invalid for input of size 12288

jerryli1981 commented 5 months ago

您好,这块下周一会配合qwen1.5的整体流程升级进一步完善,目前我们在做qwen1.5的继续预训练流程下游任务效果提升。如果您急需这个转换的话,可以进群加我钉钉,我把内部能跑通的版本先发给您

cdj0311 commented 5 months ago

您好,这块下周一会配合qwen1.5的整体流程升级进一步完善,目前我们在做qwen1.5的继续预训练流程下游任务效果提升。如果您急需这个转换的话,可以进群加我钉钉,我把内部能跑通的版本先发给您

好的,感谢!