flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
768 stars 64 forks source link

rafactor: move `gqa_group_size` from template parameter to input arguments #301

Closed yzh119 closed 3 weeks ago

yzh119 commented 3 weeks ago

262 is out of sync with main, this PR rebased the code on main branch.

This PR also greatly reduce the binary size because we don't need to compile prefill kernels for each gqa group size.