alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

Qwen2 0.5B 和 1.5B的模型是否应该将这个参数去掉? #296

Closed MrWaterZhou closed 1 month ago

MrWaterZhou commented 2 months ago

如题, 看见这个参数是写死的, 在训练1.5B模型时发现结果有明显异常 https://github.com/alibaba/Pai-Megatron-Patch/blob/bfdc653ba1c8e8e2040a76b551cdcb6800c1e219/examples/qwen2/run_finetune_qwen.sh#L283C1-L284C1 image

jerryli1981 commented 1 month ago

好的,收到,多谢提醒。在更新llama3.1的时候我们会把这个同步修掉