flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.46k stars 143 forks source link

perf: speedup jit compilation of prefill attention kernels #632

Closed yzh119 closed 7 hours ago

yzh119 commented 7 hours ago

Followup of https://github.com/flashinfer-ai/flashinfer/pull/628, this PR splits prefill attention jit templates so that we compile different mask modes in different files.

JIT compilation time of a prefill kernels of a certain configuration (shape, dtype etc) could be reduced to 10 seconds after this PR.