Open mturnshek opened 1 day ago
Can you check if it's better now?
I can confirm that ComfyUI is way faster for both loading and model patching for me now as of your commit from 40 minutes ago 67158994a4356d0ec54aaf3bbc5619c6c119f540.
Loading the base model has gone from 40s to about 5 seconds for me. Probably 3-8x speedup overall for patching and loading which are huge bottlenecks.
By the way, have you ever thought about / looked into combining similar layer patches into one tensor before casting them and transferring to the device in this section of model_patcher.py
?
load_completely.sort(reverse=True)
for x in load_completely:
n = x[1]
m = x[2]
weight_key = "{}.weight".format(n)
bias_key = "{}.bias".format(n)
if hasattr(m, "comfy_patched_weights"):
if m.comfy_patched_weights == True:
continue
self.patch_weight_to_device(weight_key, device_to=device_to)
self.patch_weight_to_device(bias_key, device_to=device_to)
logging.debug("lowvram: loaded module regularly {} {}".format(n, m))
m.comfy_patched_weights = True
They're all done individually since there are differently shaped layers and it's easy. But if they were combined so that all similar layers were sent over as one stacked tensor, the number of calls would be reduced from something like 429 to ~14 for Flux's unet, with much more data per call.
I'm not sure how efficient sending lots of small tensors over is compared to grouping and sending it as a larger block, but it could be significant.
Feature Idea
In
model_management.py
https://github.com/comfyanonymous/ComfyUI/blob/7390ff3b1ec2e15017ba4a52d6eaabc4aa4636e3/comfy/model_management.py#L834
Changing this function back to its pre-TODO state results in a large speedup in model patching. (19s -> 6s for Flux LoRAs on my computer.) It probably also speeds up loading in other areas.
This is because the largest bottleneck is the one-by-one blocking transfer of each layer of the unet to the GPU, which is massively accelerated if non_blocking=True.
Are there still memory issues? Changes like this (https://github.com/comfyanonymous/ComfyUI/commit/39f114c44bb99d4a221e8da451d4f2a20119c674) since this TODO was written could mean the same problems that used to cause memory issues may be less relevant than before or non-existent.
Please consider re-adding support for non_blocking=True as a launch argument so users can start trying it out again.
Existing Solutions
No response
Other
No response