jit: further accelerate compilation by spliting files and multi-threading

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

1.48k stars 147 forks source link

Closed yzh119 closed 4 days ago

yzh119 commented 4 days ago

This PR accelerates JIT compilation by:

Add a parallel_load_modules function to load necessary modules for a model in parallel using python multi-threading.
Splitting sampling.cu into renorm.cu and sampling.cu

The batch prefill attention template could be further split into multiple instances to accelerate compilation, we leave that for future work.