chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.05k stars 59 forks source link

Save compiled model and reuse? #111

Closed alecyan1993 closed 5 months ago

alecyan1993 commented 5 months ago

Hi,

Thanks for the amazing work. Is there any way to save the compiled model and reuse it for the acceleration? Thanks!

chengzeyi commented 5 months ago

Hi,

Thanks for the amazing work. Is there any way to save the compiled model and reuse it for the acceleration? Thanks!

That would quite complex since the compiled pipeline has many dynamic things. So my suggestion is to find ways to speedup the loading & compilation.

alecyan1993 commented 5 months ago

thanks for the reply!

TimPietrusky commented 4 months ago

@chengzeyi Do you have any suggestions on how to speedup the compilation? Because this step takes the most amount of time, between 30 to 45 seconds. We tested with a 4090 and 3080 on Windows (sadly Triton doesn't exist here and we are forced to use Windows, not WSL), the compilation is almost the same on these.