comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
49.65k stars 5.22k forks source link

Default arg "cuda-malloc" causes CUDA error: operation not supported on GTX 960M GPU #940

Open wklchris opened 1 year ago

wklchris commented 1 year ago

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly.

If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card.


Questions \& suggestions:

comfyanonymous commented 1 year ago

Just to make sure, are you on the latest nvidia drivers?

wklchris commented 1 year ago

Just to make sure, are you on the latest nvidia drivers?

No, the latest is v536 and I am still staying on v531. I didn't update because I saw many complains that Nvidia driver after 531 may have slowdown issue (see vladmandic/automatic #1285) and it looks by now new drivers haven't solved the issue yet.

Is cuda-malloc something only supported by drivers after v531?

comfyanonymous commented 1 year ago

531 should support it. if anyone else is running a GTX 9xx or older nvidia GPU has the same issue let me know so I know which GPUs are affected.

comfyanonymous commented 1 year ago

https://github.com/comfyanonymous/ComfyUI/commit/799c08a4ce01feb9e5b4aae8fec4347f2259f9c4#diff-1eb25131bac2fdf60f5ac5d483edd7f75f6654d6eb927ebb2b2c68aa71ebc351R40

I added a list of GPUs not to enable cuda malloc on so if someone else has a similar issue with one I didn't put on the list let me know.

MaddyAurora commented 1 year ago

I get the same problem, GTX 750 Ti

comfyanonymous commented 1 year ago

It should be auto disabled on the 750 Ti now: https://github.com/comfyanonymous/ComfyUI/commit/39c58b227fa265f65c96ef133c580e790e64d8e7

Namnodorel commented 1 year ago

Same issue with the current code on a GeForce GTX 960, disabling the setting fixed it.

comfyanonymous commented 1 year ago

Should be disabled on the regular GTX 960 now: https://github.com/comfyanonymous/ComfyUI/commit/85a8900a148c881914ed16900108f08fd26981c1

andres885 commented 1 year ago

I have the same problem on a GTX 970

lilshippo commented 1 year ago

had to "--disable-cuda-malloc" as well, running on "NVIDIA GeForce GT 840M 2 GB"

comfyanonymous commented 1 year ago

Should be fixed: https://github.com/comfyanonymous/ComfyUI/commit/fc71cf656e1f26e6577c0a211b7460fc078b0c39

haoqiangyu commented 1 year ago

image The lower version of the Nvidia driver cannot use cudaMallocAsync. It is also necessary to check the driver version before using it.

Chillnear commented 1 year ago

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly.

If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card.

Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?
  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

What is the program that i can type this to disable cuda-malloc thank you.

Veranith commented 1 year ago

I also have this issue with my NVIDIA GeForce GTX 950. Adding --disable-cuda-malloc worked for me.

lew1s commented 1 year ago

I have Nvidia Geforce GTX 960M but not working for me too.

Where do I have to add or run this command? I tried in Powershell but got error --disable-cuda-malloc

At line:1 char:3

lew1s commented 1 year ago

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly.

If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card.

Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?
  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

How did you disable cuda malloc, where?

Topzie commented 1 year ago

I have confirmed that the default arg --cuda-malloc causes error on my computer. I must disable it by adding --disable-cuda-malloc to let ComfyUI work properly. If I don't disable it, the following CUDA error will occur when I try to gerenated an image:

(venv) PS C:\Users\wklchris> python "${comfyuiDir}/main.py" --lowvram
Total VRAM 2048 MB, total RAM 8076 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce GTX 960M : cudaMallocAsync
Using pytorch cross attention
Adding extra search path checkpoints D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path configs D:/Git-repos/stable-diffusion-webui\models/Stable-diffusion
Adding extra search path vae D:/Git-repos/stable-diffusion-webui\models/VAE
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/Lora
Adding extra search path loras D:/Git-repos/stable-diffusion-webui\models/LyCORIS
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/ESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/RealESRGAN
Adding extra search path upscale_models D:/Git-repos/stable-diffusion-webui\models/SwinIR
Adding extra search path embeddings D:/Git-repos/stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:/Git-repos/stable-diffusion-webui\models/hypernetworks
Adding extra search path controlnet D:/Git-repos/stable-diffusion-webui\models/ControlNet
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
making attention of type 'vanilla-pytorch' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-pytorch' with 512 in_channels
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
!!! Exception during processing !!!
Traceback (most recent call last):
  File "D:\Git-repos\ComfyUI\execution.py", line 145, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "D:\Git-repos\ComfyUI\execution.py", line 75, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "D:\Git-repos\ComfyUI\execution.py", line 68, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "D:\Git-repos\ComfyUI\nodes.py", line 1082, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "D:\Git-repos\ComfyUI\nodes.py", line 1052, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "D:\Git-repos\ComfyUI\comfy\sample.py", line 75, in sample
    comfy.model_management.load_model_gpu(model)
  File "D:\Git-repos\ComfyUI\comfy\model_management.py", line 298, in load_model_gpu
    accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 498, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "D:\Git-repos\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\modeling.py", line 147, in set_module_tensor_to_device
    new_value = old_value.to(device)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I run ComfyUI on Windows with torch 2.0 and I use a GTX 960M card. Questions & suggestions:

  • I have no knowledge of CUDA malloc; I just guess the above error is a hardware compatibility problem (correct me if I am wrong) because I am using an old graphic card 960M. Is there anyway ComfyUI can detect such incompatibility during launch and automatically disable this argument, just like enable lowvram when it knows the GPU has smal VRAM?
  • If there is little we can do during launch, I suggest that ComfyUI tells user to manually disable it in the terminal when encounters this error at runtime (if possible), or at least warns users in the argument description of --help. I am requesting so because when I checked the argument list with the --help option, I saw:

    --cuda-malloc         Enable cudaMallocAsync (enabled by default for torch 2.0 and up).

    The above description really looks like users should enable cuda malloc when torch 2.0 is installed; however, some users like me have to disable it even they have torch 2.0.

How did you disable cuda malloc, where?

I had exactly the same and just added the nvidia card to the blacklist in cuda_malloc.py and it works.

comfyanonymous commented 1 year ago

Can you tell me which string you added to the blacklist so I can add it?

Topzie commented 1 year ago

Can you tell me which string you added to the blacklist so I can add it?

Oops, I got you wrong. I thought you were the OP, deleted my post 😅 It's GeForce GTX 1650

lew1s commented 1 year ago

I have GeForece 960M and added it to the blocklist in the cuda_malloc.py file as you can see here.

https://www.screencast.com/t/PqOwuMCM

But I am getting still the same error:

`Error occurred when executing KSampler:

CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1206, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1176, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "C:\Apps\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 75, in sample comfy.model_management.load_model_gpu(model) File "C:\Apps\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 307, in load_model_gpu accelerate.dispatch_model(real_model, device_map=device_map, main_device=torch_dev) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\big_modeling.py", line 373, in dispatch_model attach_align_device_hook_on_blocks( File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\hooks.py", line 523, in attach_align_device_hook_on_blocks add_hook_to_module(module, hook) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module module = hook.init_hook(module) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\hooks.py", line 253, in init_hook set_module_tensor_to_device(module, name, self.execution_device) File "C:\Apps\ComfyUI_windows_portable\python_embeded\lib\site-packages\accelerate\utils\modeling.py", line 165, in set_module_tensor_to_device new_value = old_value.to(device)`

CamelliasW commented 1 year ago

in run_nvidia_gpu.bat at line end "-s ComfyUI\main.py --windows-standalone-build" add --disable-cuda-malloc

hanetyb commented 1 year ago

@ comfyanonymous its my GPU run at Vmware and has the same issue, is it in compatible list? should i add it in black list?

pls refer to below in detail. image https://github.com/lllyasviel/Fooocus/issues/188

hanetyb commented 1 year ago

i did the change, but file is auto update to date once i run fooocus and auto remove the disable parameter .

could you please advise how i should disable the sync function or update? its not functional to set file as ready only.

in run_nvidia_gpu.bat at line end "-s ComfyUI\main.py --windows-standalone-build" add --disable-cuda-malloc

MGerckens commented 11 months ago

I believe I'm having the same issue on my 1060 Max-Q. Tried reinstalling torch, no change. I can only generate with --disable-cuda-malloc.

Terraphice commented 11 months ago

Same issue on the 1660ti (Notebook Ed.)

Edit: I had recently re-installed my OS and my drivers were mistakenly outdated, after updating it seems to have fixed this for me, my bad.

offroadguy56 commented 8 months ago

I am also getting the same error and fixing it with the same solution on my Tesla P40. I didn't have any issues until I noticed the problem on 12/29/23. My old driver version was 528.89. I updated to the latest version 537.70. The problem was not fixed with a driver update. System still reports CUDA version 12.0 instead of 12.2 as listed on driver download page. The system is running in a VM if that means anything. I also tried to add the card to the blacklist in cuda_malloc.py but was still met with the same error with "Tesla P40" and "NVIDIA Tesla P40".

hkdemiralp commented 7 months ago

I also had to add "--disable-cuda-malloc", for my old 4 GB "NVIDIA GeForce GTX 850M" graphic card. (Driver Version: 546.33 CUDA Version: 12.3).

BigYuanHead commented 5 months ago

In "NVIDIA Tesla M40 24G", "--disable-cuda-malloc" is also required ( Driver Version: 535.161.08)

codejach commented 5 months ago

In "NVIDIA Tesla M40 24G", "--disable-cuda-malloc" is also required ( Driver Version: 550.67)

efwfe commented 3 months ago

In "NVIDIA L4 24G", "--disable-cuda-malloc" is also required ( Driver Version: 535.171.04)

SergioKingOne commented 3 months ago

¿What are the consequences of disabling this? I'm running on a 4090 and I'm getting this issue (nvidia driver is 550)

ssiwinter commented 3 months ago

where am I supposed to put "--disable-cuda-malloc"? since I'm on linux putting it in windows .bat file did nothing Thanks

SergioKingOne commented 3 months ago

where am I supposed to put "--disable-cuda-malloc"? since I'm on linux putting it in windows .bat file did nothing Thanks

You can run ComfyUI with the parameter (i.e. python main.py --disable-cuda-malloc).

melanie0901 commented 2 months ago

image The lower version of the Nvidia driver cannot use cudaMallocAsync. It is also necessary to check the driver version before using it.

my card also V100, but I got the same problem, please help. smi

Kimizhao commented 2 months ago

In "GRID A800D-80C" vGPU mode, "--disable-cuda-malloc" is also required ( Driver Version: 525.105.17)

brentonmallen1 commented 1 month ago

Encountered this on Unraid with a GTX 980 with driver version v550.100. I'm not sure where to put the --disable-cuda-malloc flag for it to be called within the container though.

robert-pattern commented 3 weeks ago

799c08a#diff-1eb25131bac2fdf60f5ac5d483edd7f75f6654d6eb927ebb2b2c68aa71ebc351R40

I added a list of GPUs not to enable cuda malloc on so if someone else has a similar issue with one I didn't put on the list let me know.

This ended up fixing my problem for "NVIDIA A10G"

fahadshery commented 3 weeks ago

799c08a#diff-1eb25131bac2fdf60f5ac5d483edd7f75f6654d6eb927ebb2b2c68aa71ebc351R40

I added a list of GPUs not to enable cuda malloc on so if someone else has a similar issue with one I didn't put on the list let me know.

I have the same issue on Tesla P40 This is what I see:

RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
fahadshery commented 3 weeks ago

I am using: image

fahadshery commented 3 weeks ago

Encountered this on Unraid with a GTX 980 with driver version v550.100. I'm not sure where to put the --disable-cuda-malloc flag for it to be called within the container though.

did you find the solution?

fahadshery commented 3 weeks ago

I am using docker and this is how I am deploying it with my ai-stack in the docker-compose.yaml:


# stable diffusion

  stable-diffusion-download:
    build: ./stable-diffusion-webui-docker/services/download/
    image: comfy-download
    environment:
      - PUID=${PUID:-1000}
      - PGID=${PGID:-1000}
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
      - ./stable-diffusion-webui-docker/data:/data

  stable-diffusion-webui:
    build: ./stable-diffusion-webui-docker/services/comfy/
    image: comfy-ui
    environment:
      - PUID=${PUID:-1000}
      - PGID=${PGID:-1000}
      - CLI_ARGS=
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
      - ./stable-diffusion-webui-docker/data:/data
      - ./stable-diffusion-webui-docker/output:/output

    stop_signal: SIGKILL
    tty: true
    deploy:
      resources:
        reservations:
          devices:
              - driver: nvidia
                device_ids: ['0']
                capabilities: [compute, utility]
    restart: unless-stopped
    networks:
      - traefik
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.stable-diffusion.rule=Host(`stable-diffusion.local.example.com`)"
      - "traefik.http.routers.stable-diffusion.entrypoints=https"
      - "traefik.http.routers.stable-diffusion.tls=true"
      - "traefik.http.routers.stable-diffusion.tls.certresolver=cloudflare"
      - "traefik.http.services.stable-diffusion.loadbalancer.server.port=7860"
      - "traefik.http.routers.stable-diffusion.middlewares=default-headers@file"
      - ```
fahadshery commented 3 weeks ago

In "NVIDIA Tesla M40 24G", "--disable-cuda-malloc" is also required ( Driver Version: 535.161.08)

where do you pass this option? I have Tesla P40 24G and I think I have the same issue