flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.48k stars 147 forks source link

jit: further accelerate compilation by spliting files and multi-threading #628

Closed yzh119 closed 4 days ago

yzh119 commented 4 days ago

This PR accelerates JIT compilation by:

The batch prefill attention template could be further split into multiple instances to accelerate compilation, we leave that for future work.