deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

Error in Equation 16? #6

Closed zhongmz closed 4 months ago

zhongmz commented 4 months ago

It appears that the current formulation is ?

$$ q{t,i} = [q^{C}{t,i};q_{t}^R], $$

DeepSeekDDM commented 4 months ago

The formula is correct. q^R are multi-head, and only k^R is shared. You can refer to the illustration of DeepSeek-V2 for an intuitive understanding.