comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
42.83k stars 4.53k forks source link

Can't use LoRAs with SDXL anymore #1947

Closed SERGEYDJUM closed 8 months ago

SERGEYDJUM commented 8 months ago

ComfyUI applied SDXL LoRAs or LCM LoRA fine before 4a8a839b40fcae9960a6107200b89dce6675895d, but after that it shows message below during generation. With some combinations of checkpoints and loras it works, but memory usage goes from 6GB to 12GB.

Error occurred when executing KSampler:

Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 6.11 GiB
Requested : 50.00 MiB
Device limit : 8.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
: 17179869184.00 GiB

File "G:\CodeApps\ComfyUI\execution.py", line 153, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\execution.py", line 83, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\execution.py", line 76, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\nodes.py", line 1237, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\nodes.py", line 1207, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\comfy\sample.py", line 93, in sample
real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\comfy\sample.py", line 86, in prepare_sampling
comfy.model_management.load_models_gpu([model] + models, comfy.model_management.batch_area_memory(noise_shape[0] * noise_shape[2] * noise_shape[3]) + inference_memory)
File "G:\CodeApps\ComfyUI\comfy\model_management.py", line 406, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\comfy\model_management.py", line 289, in model_load
raise e
File "G:\CodeApps\ComfyUI\comfy\model_management.py", line 285, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\comfy\model_patcher.py", line 182, in patch_model
temp_weight = comfy.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\CodeApps\ComfyUI\comfy\model_management.py", line 513, in cast_to_device
return tensor.to(device, copy=copy).to(dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Context

GPU: RTX 4060 Laptop (8GB VRAM) Workflow: default, but with SDXL LoRA and empty latent size set to 1024x1024 Args: --use-pytorch-cross-attention

Moxie1776 commented 8 months ago

I had not isolated this to the Lora yet, but that does seem to work without it.

Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 9.57 GiB Requested : 25.00 MiB Device limit : 11.76 GiB Free (according to CUDA): 41.81 MiB PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB

SERGEYDJUM commented 8 months ago

Workaround

Line 179 of comfy/model_patcher.py

if key not in self.backup:
-   self.backup[key] = weight.to(device=device_to, copy=inplace_update)
+   self.backup[key] = weight.to(device=self.offload_device, copy=inplace_update)

Now LoRA works

comfyanonymous commented 8 months ago

should be fixed now.