Open jiangix-paper opened 4 months ago
Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.
Same issue: https://github.com/deepseek-ai/DeepSeek-V2/issues/26 @jiangix-paper
Hello, great work. I want to know why the performance of MLA is better than that of MHA. I think MLA is a approximate low-rank decomposition for MHA.