CUDA error: operation not supported

kungfudante commented 2 months ago

Running on A100, driver version 551.78 Using the original workflow: Got error message as below:

got prompt
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight']
Requested to load SD1ClipModel
Loading 1 new model
!!! Exception during processing !!!
Traceback (most recent call last):
  File "C:\ComfyUI\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\nodes.py", line 58, in encode
    cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\comfy\sd.py", line 135, in encode_from_tokens
    self.load_model()
  File "C:\ComfyUI\ComfyUI\comfy\sd.py", line 155, in load_model
    model_management.load_model_gpu(self.patcher)
  File "C:\ComfyUI\ComfyUI\comfy\model_management.py", line 453, in load_model_gpu
    return load_models_gpu([model])
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\comfy\model_management.py", line 447, in load_models_gpu
    cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\comfy\model_management.py", line 304, in model_load
    raise e
  File "C:\ComfyUI\ComfyUI\comfy\model_management.py", line 300, in model_load
    self.real_model = self.model.patch_model(device_to=patch_model_to, patch_weights=load_weights)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\ComfyUI\comfy\model_patcher.py", line 259, in patch_model
    self.model.to(device_to)
  File "C:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "C:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Checked other similar issues, mostly are caused by insufficient vram, solved by adding --disable-cuda-malloc, but that would disable the GPU

NeihTzxc commented 1 month ago

Same issue

PavanGandhii commented 1 month ago

did u find a solution?

NeihTzxc commented 1 month ago

did u find a solution? I think the reason lies in MIG, so I bought the instance at https://cloud.vast.ai, and it works very well, no errors occurred.

comfyanonymous / ComfyUI

CUDA error: operation not supported #3386