Use the default workflow and add some Load LoRA node to it.
Monitor CPU usage and execution time.
Debug Logs
n/a
Other
Recently I found a increase in CPU usage when loading lora, especially when applying three or more loras.
After some git bisect debugging I'm sure the reason is this commit 67158994a4356d0ec54aaf3bbc5619c6c119f540
Expected Behavior
n/a
Actual Behavior
n/a
Steps to Reproduce
Load LoRA
node to it.Debug Logs
Other
Recently I found a increase in CPU usage when loading lora, especially when applying three or more loras. After some
git bisect
debugging I'm sure the reason is this commit 67158994a4356d0ec54aaf3bbc5619c6c119f540Before
After, with same workload
Tracing result shows a lot of cpu seconds are consumed by this line https://github.com/comfyanonymous/ComfyUI/blob/67158994a4356d0ec54aaf3bbc5619c6c119f540/comfy/model_management.py#L851
and before this commit, the
cast_to_device
function was basically callingTensor.to()
which I think will be faster than thecopy_
method.This regression significantly affects workloads with multiple LoRAs, especially on systems where CPU resources are already constrained.
Edit: I'm using
--gpu-only
option, just in case it's related