Jordach / comfy-consistency-vae

Implements OpenAI's new VAE decoder
MIT License
68 stars 4 forks source link

Throws OOM on 8 GB #2

Open f-rank opened 8 months ago

f-rank commented 8 months ago

Probably needs some sort of optimization for 8 GB cards, if even possible, don't know if tiled VAE is even possible. Kept throwing this:

ERROR:root:!!! Exception during processing !!! ERROR:root:Traceback (most recent call last): File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 153, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 83, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\ComfyUI\execution.py", line 76, in map_node_over_list results.append(getattr(obj, func)(*slice_dict(input_data_all, i))) File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfy-consistency-vae\nodes.py", line 29, in decode consistent_latent = decoder_consistency(latent["samples"].to("cuda:0")) File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfy-consistency-vae\consistencydecoder__init.py", line 159, in call__ model_output = self.ckpt(c_in xstart, t, features=features) File "D:\WORK\conda_envs\ConfyUI\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/dalle_runner_api/model_infra/modules/public_diff_vae.py", line 122, in forward _28 = (up_1_conv_2).forward(_27, _6, _0, ) _29 = (up_1_conv_3).forward(_28, _5, _0, ) _30 = (up_0_conv_0).forward((up_1_upsamp).forward(_29, _0, ), _4, _0, )


    _31 = (up_0_conv_1).forward(_30, _3, _0, )
    _32 = (up_0_conv_2).forward(_31, _2, _0, )
  File "code/__torch__/dalle_runner_api/model_infra/modules/public_diff_vae/___torch_mangle_210.py", line 24, in forward
    m = torch.cat([argument_1, argument_2], 1)
    x = torch.to(m, 5)
    _0, _1, = (gn_1).forward(x, )
               ~~~~~~~~~~~~~ <--- HERE
    _2 = (f_1).forward(_0, )
    x0 = torch.silu(argument_3)
  File "code/__torch__/dalle_runner_api/model_infra/modules/public_diff_vae/___torch_mangle_204.py", line 14, in forward
    x0 = torch.contiguous(x)
    input = torch.to(x0, 6)
    input0 = torch.to(torch.group_norm(input, 32, g, b), 5)
                      ~~~~~~~~~~~~~~~~ <--- HERE
    x1 = torch.silu(input0)
    return (x1, x0)

Traceback of TorchScript, original code (most recent call last):
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/functional.py(2558): group_norm
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(201): forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(728): forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(1037): forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/jit/_trace.py(1065): trace_module
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/jit/_trace.py(798): trace
<ipython-input-1-15c8be35cc98>(442): test_diff_vae
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/fire/core.py(681): _CallAndUpdateTrace
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/fire/core.py(466): _Fire
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/fire/core.py(141): Fire
<ipython-input-1-15c8be35cc98>(470): <module>
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3526): run_code
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3466): run_ast_nodes
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3284): run_cell_async
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/async_helpers.py(129): _pseudo_sync_runner
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3079): _run_cell
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3024): run_cell
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py(881): interact
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py(888): mainloop
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/terminal/ipapp.py(317): start
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/traitlets/config/application.py(1053): launch_instance
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/__init__.py(129): start_ipython
/root/.pyenv/versions/3.11.5/bin/ipython(8): <module>
RuntimeError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 5.86 GiB
Requested               : 658.36 MiB
Device limit            : 8.00 GiB
Free (according to CUDA): 0 bytes
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB
txirrindulari commented 8 months ago

I have this similar problem, but using a 24GB vram Here is my information:

File "/xxx/ComfyUI/execution.py", line 153, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "/xxx/ComfyUI/execution.py", line 83, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "/xxx/ComfyUI/execution.py", line 76, in map_node_over_list results.append(getattr(obj, func)(*slice_dict(input_data_all, i))) File "/xxx/ComfyUI/custom_nodes/comfy-consistency-vae/nodes.py", line 29, in decode consistent_latent = decoder_consistency(latent["samples"].to("cuda:0")) File "/xxx/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/xxx/ComfyUI/custom_nodes/comfy-consistency-vae/consistencydecoder/init.py", line 159, in call model_output = self.ckpt(c_in xstart, t, features=features) File "/xxx/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/dalle_runner_api/model_infra/modules/public_diff_vae.py", line 122, in forward _28 = (up_1_conv_2).forward(_27, _6, _0, ) _29 = (up_1_conv_3).forward(_28, _5, _0, ) _30 = (up_0_conv_0).forward((up_1_upsamp).forward(_29, _0, ), _4, _0, )


    _31 = (up_0_conv_1).forward(_30, _3, _0, )
    _32 = (up_0_conv_2).forward(_31, _2, _0, )
  File "code/__torch__/dalle_runner_api/model_infra/modules/public_diff_vae/___torch_mangle_203.py", line 45, in forward
    t = torch.to((f_t).forward(x1, ), torch.device("cuda:0"), 5)
    g, b, = torch.chunk(t, 2, 1)
    _21 = (f_2).forward((gn_2).forward(_20, b, g, ), )
                         ~~~~~~~~~~~~~ <--- HERE
    r = torch.to(_21, torch.device("cuda:0"), 6)
    y = torch.add(m, r)
  File "code/__torch__/dalle_runner_api/model_infra/modules/public_diff_vae/___torch_mangle_202.py", line 16, in forward
    x = torch.contiguous(argument_1)
    input = torch.to(x, 6)
    y = torch.to(torch.group_norm(input, 32, g0, b0), 5)
                 ~~~~~~~~~~~~~~~~ <--- HERE
    _0 = torch.slice(b, 0, 0, 9223372036854775807)
    _1 = torch.slice(_0, 1, 0, 9223372036854775807)

Traceback of TorchScript, original code (most recent call last):
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/functional.py(2558): group_norm
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(201): forward
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(229): forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(709): gn_2_fn
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(746): forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/root/code/dalle-runner-api/dalle_runner_api/model_infra/modules/public_diff_vae.py(1041): forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1508): _slow_forward
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/jit/_trace.py(1065): trace_module
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/torch/jit/_trace.py(798): trace
<ipython-input-1-15c8be35cc98>(442): test_diff_vae
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/fire/core.py(681): _CallAndUpdateTrace
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/fire/core.py(466): _Fire
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/fire/core.py(141): Fire
<ipython-input-1-15c8be35cc98>(470): <module>
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3526): run_code
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3466): run_ast_nodes
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3284): run_cell_async
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/async_helpers.py(129): _pseudo_sync_runner
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3079): _run_cell
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/core/interactiveshell.py(3024): run_cell
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py(881): interact
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py(888): mainloop
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/terminal/ipapp.py(317): start
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/traitlets/config/application.py(1053): launch_instance
/root/.pyenv/versions/3.11.5/lib/python3.11/site-packages/IPython/__init__.py(129): start_ipython
/root/.pyenv/versions/3.11.5/bin/ipython(8): <module>
RuntimeError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 17.44 GiB
Requested               : 3.75 GiB
Device limit            : 23.69 GiB
Free (according to CUDA): 13.94 MiB
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB
everdrone commented 8 months ago

same issue with 24GB vram

AlexD81 commented 8 months ago

same issue with 24GB vram here as well