deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

Exploring the Combined Effects of YaRN and Adjusted rope_base Values in deepseek v2 #87

Open hannlp opened 4 weeks ago

hannlp commented 4 weeks ago

In deepseek v2, static YaRN with rope_base=10000 was used, yielding excellent extrapolation results. Could the authors clarify whether they have attempted to set rope_base to 500000 while using YaRN, and if so, whether this combination produces a synergistic effect, surpassing both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @luofuli