flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
760 stars 64 forks source link

perf: use 1x4 warp layout for small query length #322

Closed yzh119 closed 1 week ago

yzh119 commented 2 weeks ago

Duplicate of #304 and #185, just rebased on main.

This PR can accelerate GQA, we will release v0.0.6 after this PR gets merged (ETA: tonight).