VRAM issues - Githubissues

CyberLykan commented 6 months ago

Tried the changing clothes workflow and didn't have enough VRAM. Switched to the base workflow to see if the changing clothes workflow was too demanding and still ran out of VRAM. Maybe there are some VRAM issues with the Diffusers nodes?

Jannchie commented 6 months ago

emm, I really didn't optimize for low video memory. Can you provide the startup info for comfyUI? Something like this:

Total VRAM 24576 MB, total RAM 65320 MB
xformers version: 0.0.25
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
VAE dtype: torch.bfloat16

I'll try to see if I can determine whether to use a more VRAM-efficient solution based on your environment.

NeedsMoar commented 6 months ago

https://huggingface.co/docs/diffusers/v0.27.2/en/optimization/memory#memory-efficient-attention

It just needs matching xformers & torch installed, and the method they list called (pipe.enable_xformers_memory_efficient_attention())

Comfy tries to enable the pytorch memory efficient attention / flash attention functionality but I'm fairly sure it didn't work on Windows until the current point release, and it's still slower than xformers + flash-attn2. It didn't look like diffusers supports the torch version yet and I think it might need to be explicitly enabled anyway, based on what comfy is doing. I'm guessing torch didn't want to implement the amount of auto-selection code xformers has (and there are still plenty of methods in that library that could be useful for specific models but need to be set up and called more explicitly, like the rotary embedding variants. The original paper on these was mainly geared towards making newer and more realistic crapflood bots and wasting all of the HBM2e / 3 memory ever manufactured doing it, but mentioned that it could be very useful for getting better correspondence between verbs / adjectives / nouns in sentences when generating video from text.)

Other than that diffusers will always have moderately higher memory usage unless they've changed the architecture quite a bit, the last I looked at it there wasn't much thought given to unloading things mid-pipeline because nobody involved had used a GPU with less than 48GB of vram. :P

Jannchie commented 6 months ago

I know very little about VRAM optimization. I've already enabled enable_xformers_memory_efficient_attention. maybe I should enable enable_sequential_cpu_offload or enable_model_cpu_offload when running on low VRAM machines, though they might slow down the genreate speed.

CyberLykan commented 6 months ago

emm, I really didn't optimize for low video memory. Can you provide the startup info for comfyUI? Something like this:
Total VRAM 24576 MB, total RAM 65320 MB
xformers version: 0.0.25
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
VAE dtype: torch.bfloat16
I'll try to see if I can determine whether to use a more VRAM-efficient solution based on your environment.

Total VRAM 8192 MB, total RAM 32702 MB Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce GTX 1080 : cudaMallocAsync VAE dtype: torch.float32 Using pytorch cross attention

AgeOfAlgorithms commented 6 months ago

I'm running it on rx 6700xt with 12GB vram and I run into memory issue as well.

!!! Exception during processing !!!
Traceback (most recent call last):
  File "/home/sean/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/home/sean/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/home/sean/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/home/sean/ComfyUI/custom_nodes/ComfyUI-J/__init__.py", line 748, in run
    result = pipeline(
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sean/ComfyUI/custom_nodes/ComfyUI-J/pipelines/jannchie.py", line 413, in __call__
    input_latents = self.image_to_latents(
  File "/home/sean/ComfyUI/custom_nodes/ComfyUI-J/pipelines/jannchie.py", line 873, in image_to_latents
    image_latents = self.vae.encode(image).latent_dist.sample(
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 260, in encode
    h = self.encoder(x)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/autoencoders/vae.py", line 175, in forward
    sample = self.mid_block(sample)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 738, in forward
    hidden_states = attn(hidden_states, temb=temb)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "/home/sean/anaconda3/envs/py39/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 1279, in __call__
    hidden_states = F.scaled_dot_product_attention(
torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 3.52 GiB. GPU 0 has a total capacty of 11.98 GiB of which 2.08 GiB is free. Of the allocated memory 9.35 GiB is allocated by PyTorch, and 164.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

I can't contribute much here, but I hope this info helps at least a little. I really hope we can fix this!

Jannchie commented 6 months ago

@AgeOfAlgorithms @CyberLykan

I just pushed new code that corrects the type detection and is now able to reason correctly with half-precision types if the hardware supports it.

It might help to reduce VRAM usage.

AgeOfAlgorithms commented 6 months ago

@Jannchie I updated everything and tried again, but got the same memory error

Phachu commented 6 months ago

it solved the out of memory issue for me, but the output image looks desaturated now. It worked only the first time I used it.

CyberLykan commented 6 months ago

@AgeOfAlgorithms @CyberLykan

I just pushed new code that corrects the type detection and is now able to reason correctly with half-precision types if the hardware supports it.

It might help to reduce VRAM usage.

Updated but still running out of memory.

Error occurred when executing DiffusersGenerator:

Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated : 6.86 GiB
Requested : 512.00 MiB
Device limit : 8.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
: 17179869184.00 GiB

File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-J\__init__.py", line 748, in run
result = pipeline(
^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-J\pipelines\jannchie.py", line 411, in __call__
input_latents = self.image_to_latents(
^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-J\pipelines\jannchie.py", line 874, in image_to_latents
image_latents = self.vae.encode(image).latent_dist.sample(
^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 260, in encode
h = self.encoder(x)
^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\diffusers\models\autoencoders\vae.py", line 172, in forward
sample = down_block(sample)
^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\diffusers\models\unets\unet_2d_blocks.py", line 1465, in forward
hidden_states = resnet(hidden_states, temb=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\My Stuff\Extra Stuff\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\diffusers\models\resnet.py", line 376, in forward
output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~

Jannchie / ComfyUI-J

VRAM issues #7