Closed yzh119 closed 7 hours ago
Followup of https://github.com/flashinfer-ai/flashinfer/pull/628, this PR splits prefill attention jit templates so that we compile different mask modes in different files.
JIT compilation time of a prefill kernels of a certain configuration (shape, dtype etc) could be reduced to 10 seconds after this PR.
Followup of https://github.com/flashinfer-ai/flashinfer/pull/628, this PR splits prefill attention jit templates so that we compile different mask modes in different files.
JIT compilation time of a prefill kernels of a certain configuration (shape, dtype etc) could be reduced to 10 seconds after this PR.