llama3的modeling使用的是qwen1.5

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Apache License 2.0

674 stars 94 forks source link

Closed cryoco closed 5 months ago

cryoco commented 5 months ago

https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/pretrain_llama.py#L28 是因为这两者结构没区别么但看到也有https://github.com/alibaba/Pai-Megatron-Patch/tree/main/megatron_patch/model/llama3 这个目录，是没有被用到么

jerryli1981 commented 5 months ago

您好，因为qwen1.5和llama3的这段代码都是可以复用的，因此避免代码重复和多余，我们引用了qwen1.5的。他们唯一的区别就是bias，可以通过脚本中的开关来调节。

cryoco commented 5 months ago

好的，谢谢！