deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

MLA vs MHA #19

Open jiangix-paper opened 4 months ago

jiangix-paper commented 4 months ago

Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.

luofuli commented 4 months ago

Same issue: https://github.com/deepseek-ai/DeepSeek-V2/issues/26 @jiangix-paper