flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

768 stars 64 forks source link

Closed yzh119 closed 3 weeks ago

yzh119 commented 3 weeks ago

262 is out of sync with main, this PR rebased the code on main branch.

This PR also greatly reduce the binary size because we don't need to compile prefill kernels for each gqa group size.