Closed zhongmz closed 4 months ago
It appears that the current formulation is ?
$$ q{t,i} = [q^{C}{t,i};q_{t}^R], $$
The formula is correct. q^R are multi-head, and only k^R is shared. You can refer to the illustration of DeepSeek-V2 for an intuitive understanding.
It appears that the current formulation is ?
$$ q{t,i} = [q^{C}{t,i};q_{t}^R], $$