Open yzh119 opened 2 months ago
MLA(Multi-Head Latency Attention) was proposed in DeepSeek-v2 for efficient inference.
MLA(Multi-Head Latency Attention) was proposed in DeepSeek-v2 for efficient inference.