alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

llama3的modeling使用的是qwen1.5 #221

Closed cryoco closed 5 months ago

cryoco commented 5 months ago

https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/pretrain_llama.py#L28 是因为这两者结构没区别么 但看到也有https://github.com/alibaba/Pai-Megatron-Patch/tree/main/megatron_patch/model/llama3 这个目录,是没有被用到么

jerryli1981 commented 5 months ago

您好,因为qwen1.5和llama3的这段代码都是可以复用的,因此避免代码重复和多余,我们引用了qwen1.5的。他们唯一的区别就是bias,可以通过脚本中的开关来调节。

cryoco commented 5 months ago

好的,谢谢!