perf: speedup jit compilation of prefill attention kernels

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

1.46k stars 143 forks source link

Closed yzh119 closed 7 hours ago

yzh119 commented 7 hours ago

Followup of https://github.com/flashinfer-ai/flashinfer/pull/628, this PR splits prefill attention jit templates so that we compile different mask modes in different files.

JIT compilation time of a prefill kernels of a certain configuration (shape, dtype etc) could be reduced to 10 seconds after this PR.