MLA vs MHA - Githubissues

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT License

3.47k stars 143 forks source link

Open jiangix-paper opened 4 months ago

jiangix-paper commented 4 months ago

Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.

luofuli commented 4 months ago