comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
50.62k stars 5.31k forks source link

Very High VRAM usage when using lora with flux #4681

Open axel578 opened 2 weeks ago

axel578 commented 2 weeks ago

Expected Behavior

Not 10Gb Vram eaten using the lora.

Actual Behavior

I have flux fp8 schnell on a 3090, I run two loras rank 64 onto the model, but it uses all VRAM until it starts offloading and generations of course slow down.

Steps to Reproduce

Just add two 64 rank lora onto flux schnell fp8

Debug Logs

None

Other

No response

comfyanonymous commented 2 weeks ago

Try updating: update/update_comfyui.bat if you are on the standalone.

axel578 commented 2 weeks ago

Try updating: update/update_comfyui.bat if you are on the standalone.

I updated it just now an it still uses 10 Gb for a a single lora rank 64: image

(Yes its really the lora on the screenshot taking the 10 Gig (12 already occupied by schnell model)

RandomGitUser321 commented 2 weeks ago

I updated it just now an it still uses 10 Gb for a a single lora rank 64: ![image](https://private-user-

It likely has to make a complete patched copy of the flux model and probably does it in whatever precision the lora is in or maybe it can only do it in fp16 or higher. So even though you're using flux fp8, it might have to upcast it to f16/bf16 to apply the lora patch. I don't know if it can then downcast the patched model precision back to fp8, but in theory it should be able to after it's done.

buzzjeux commented 2 weeks ago

Same issue here, before latest update of ComfyUI, all working fine with exactly same workflow than before the update (two small lora loaded).

Current version: (ComfyUI: 2630ec28cd) GPU RTX 4060 Ti Total VRAM 16379 MB, total RAM 65463 MB pytorch version: 2.4.0+cu121

Launch args: --normalvram --fast --use-pytorch-cross-attention

Flux.1 Dev / fp8_e4m3fn

Error occurred when executing KSampler:

Allocation on device

File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 317, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 192, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "G:\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 1429, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 1396, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 22, in informative_sample
raise e
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 9, in informative_sample
return original_sample(*args, **kwargs) # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\sampling.py", line 420, in motion_sample
return orig_comfy_sample(model, noise, *args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Advanced-ControlNet\adv_control\sampling.py", line 116, in acn_sample
return orig_comfy_sample(model, *args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Advanced-ControlNet\adv_control\utils.py", line 116, in uncond_multiplier_check_cn_sample
return orig_comfy_sample(model, *args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\sample.py", line 43, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 829, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 729, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 706, in sample
self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\sampler_helpers.py", line 66, in prepare_sampling
comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required, minimum_memory_required=minimum_memory_required)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 542, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 326, in model_load
raise e
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 322, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to, lowvram_model_memory=lowvram_model_memory, load_weights=load_weights, force_patch_weights=force_patch_weights)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 427, in patch_model
self.load(device_to, lowvram_model_memory=lowvram_model_memory, force_patch_weights=force_patch_weights, full_load=full_load)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 393, in load
self.patch_weight_to_device(weight_key, device_to=device_to)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 324, in patch_weight_to_device
out_weight = comfy.float.stochastic_rounding(out_weight, weight.dtype, seed=string_to_seed(key))
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\float.py", line 60, in stochastic_rounding
return manual_stochastic_round_to_float8(value, dtype, generator=generator)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\float.py", line 37, in manual_stochastic_round_to_float8
abs_x[:] = calc_mantissa(abs_x, exponent, normal_mask, MANTISSA_BITS, EXPONENT_BIAS, generator=generator)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\float.py", line 7, in calc_mantissa
(abs_x / (2.0 ** (exponent - EXPONENT_BIAS)) - 1.0) * (2**MANTISSA_BITS),
File "G:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\_tensor.py", line 41, in wrapped
return f(*args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\_tensor.py", line 991, in __rpow__
return torch.pow(other, self)

torch.OutOfMemoryError: Allocation on device 

Got an OOM, unloading all loaded models.
dan4ik94 commented 2 weeks ago

I got the same issue, loading 2 loras gives OOM. Tried experimenting with --disable-smart-memory --reserve-vram but doesn't seem to help.

Reverting to 9230f658232fd94d0beeddb94aed093a1eca82b5 helps.

@comfyanonymous this commit breaks stuff (7985ff88b9a7099378b5f2026bee5da63d3fc53f) after it I started getting OOMs.

JorgeR81 commented 2 weeks ago

I got the same issue, loading 2 loras gives OOM. Tried experimenting with --disable-smart-memory --reserve-vram but doesn't seem to help.

I have only 8GB VRAM. --disable-smart-memory didn't work for me. --reserve-vram worked, but we need to keep increasing the value, until it works for a specific task:

dan4ik94 commented 2 weeks ago

Well I can in theory get rid off the OOMs by falling back to system memory, but that's 10x times slower. I want the same performance as in c6812947e98eb384250575d94108d9eb747765d9

axel578 commented 2 weeks ago

Its still problematic on last version @comfyanonymous

Archviz360 commented 2 weeks ago

same problem here. use fooocus ai instead until they fix this annoyong error.

buzzjeux commented 1 week ago

Bug are fix for me with the latest update Lower fp8 lora memory usage. Thanks!