alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
723 stars 103 forks source link

关于LLAMA 3.1模型的适配问题 #361

Open echo-valor opened 1 month ago

echo-valor commented 1 month ago

按照目前PAI适配的方法,有这样的疑问:为什么不考虑适配low_freq_factor / high_freq_factor?

  1. 以llama 3.1-70b-base模型为例,他的config.json文件中有如下参数设置: "rope_scaling": { "factor": 8.0, "low_freq_factor": 1.0, "high_freq_factor": 4.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, 这里的low_freq_factor / high_freq_factor缩放,在对Attn进行位置编码时,high_freq_factor=4肯定会产生对高维度编码的影响,如何在实际对Llama 3.1模型进行训练时,进行上述参数适配,求解。
lostkevin commented 1 month ago

我们内部实现了这一参数,但并没有提供外部接口,如果您有需求,请对相应代码进行修改,详见https://github.com/alibaba/Pai-Megatron-Patch/blob/9d3e557b4d5f386a456a49da23aa47af737baaf3/megatron_patch/model/llama3_1/model.py#L127