chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.05k stars 59 forks source link

Cuda OOM issues #117

Open alecyan1993 opened 5 months ago

alecyan1993 commented 5 months ago

Hi, We are currently using stable-fast in our AI generation service. However, we have many SDXL models in the server. We encountered some cuda OOM issues if we switch the compiled models for different tasks for a few times. Before stable-fast included, our way to do it is inference with one model and move the model to cpu after the run. However, after the stable-fast implementation, this cleanup process will leave some extra memorty in the cuda. Can you explain a bit the reason and how to solve this issue? Big thanks!

chengzeyi commented 5 months ago

Hi, We are currently using stable-fast in our AI generation service. However, we have many SDXL models in the server. We encountered some cuda OOM issues if we switch the compiled models for different tasks for a few times. Before stable-fast included, our way to do it is inference with one model and move the model to cpu after the run. However, after the stable-fast implementation, this cleanup process will leave some extra memorty in the cuda. Can you explain a bit the reason and how to solve this issue? Big thanks!

I suggest you switching model weights instead of recompiling them if you need to switch models very frequently. Or you could have a multi-process design to avoid remaining resources occupancy after cleanup. The remaining memory might be caused by the tracing or freezing process of torchscript.

alecyan1993 commented 5 months ago

Thanks so much for your quick reply! Do you have any examples of switching model weights instead of recompiling? Yes we do need to switch models very frequently on our platform.

chengzeyi commented 5 months ago

Thanks so much for your quick reply! Do you have any examples of switching model weights instead of recompiling? Yes we do need to switch models very frequently on our platform.

You can check the LoRA switching example in the doc to see how to switch weights.

alecyan1993 commented 5 months ago

sure! thanks again!