comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
57.49k stars 6.09k forks source link

Performance degradation when loading loras #5696

Open lackofdream opened 1 day ago

lackofdream commented 1 day ago

Expected Behavior

n/a

Actual Behavior

n/a

Steps to Reproduce

Debug Logs

n/a

Other

Recently I found a increase in CPU usage when loading lora, especially when applying three or more loras. After some git bisect debugging I'm sure the reason is this commit 67158994a4356d0ec54aaf3bbc5619c6c119f540

Before image

After, with same workload image

Tracing result shows a lot of cpu seconds are consumed by this line https://github.com/comfyanonymous/ComfyUI/blob/67158994a4356d0ec54aaf3bbc5619c6c119f540/comfy/model_management.py#L851

and before this commit, the cast_to_device function was basically calling Tensor.to() which I think will be faster than the copy_ method.

This regression significantly affects workloads with multiple LoRAs, especially on systems where CPU resources are already constrained.

Edit: I'm using --gpu-only option, just in case it's related