Closed yzh119 closed 4 days ago
This PR accelerates JIT compilation by:
parallel_load_modules
The batch prefill attention template could be further split into multiple instances to accelerate compilation, we leave that for future work.
This PR accelerates JIT compilation by:
parallel_load_modules
function to load necessary modules for a model in parallel using python multi-threading.The batch prefill attention template could be further split into multiple instances to accelerate compilation, we leave that for future work.