关于LLAMA 3.1模型的适配问题

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Apache License 2.0

723 stars 103 forks source link

按照目前PAI适配的方法，有这样的疑问：为什么不考虑适配low_freq_factor / high_freq_factor？

以llama 3.1-70b-base模型为例，他的config.json文件中有如下参数设置： "rope_scaling": { "factor": 8.0, "low_freq_factor": 1.0, "high_freq_factor": 4.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, 这里的low_freq_factor / high_freq_factor缩放，在对Attn进行位置编码时，high_freq_factor=4肯定会产生对高维度编码的影响，如何在实际对Llama 3.1模型进行训练时，进行上述参数适配，求解。

alibaba / Pai-Megatron-Patch