flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
822 stars 77 forks source link

Support MLA (Multi-Head Latency Attention) in DeepSeek-v2 #237

Open yzh119 opened 2 months ago

yzh119 commented 2 months ago

MLA(Multi-Head Latency Attention) was proposed in DeepSeek-v2 for efficient inference.