CUDA error on ZLUDA: CUBLAS_STATUS_NOT_SUPPORTED when calling 'cublasSgemm()'

Expected Behavior

Normally, when I use CUDA on ZLUDA, the prompt should be executed: I am using an AMD Radeon Vega 8 Graphics GPU with the AMD Ryzen 5 3500U CPU. It should happen normally... if it weren't for...

Actual Behavior

...this. FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE] got prompt model_type EPS Using split attention in VAE Using split attention in VAE loaded straight to GPU Requested to load BaseModel Loading 1 new model Requested to load SD1ClipModel Loading 1 new model !!! Exception during processing!!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when callingcublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) Traceback (most recent call last): File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 58, in encode output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 115, in encode_from_tokens o = self.cond_stage_model.encode_token_weights(tokens) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 567, in encode_token_weights out = getattr(self, self.clip).encode_token_weights(token_weight_pairs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights o = self.encode(to_encode) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 228, in encode return self(tokens) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 200, in forward outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 134, in forward x = self.text_model(*args, **kwargs) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 109, in forward x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 68, in forward x = l(x, mask, optimized_attention) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 49, in forward x += self.self_attn(self.layer_norm1(x), mask, optimized_attention) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 16, in forward q = self.q_proj(x) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 50, in forward return self.forward_comfy_cast_weights(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 46, in forward_comfy_cast_weights return torch.nn.functional.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when callingcublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``

This is a CUDA error, indicating that the cuBLAS status is not supported, so why is this happening?

I am using Python 3.10.11, with PyTorch 2.0.0+cu118 and ZLUDA. And yes, I did apply the --disable-all-custom-nodes flag, to no avail.

Steps to Reproduce

It is heavily assumed that this issue is on my end only, but here is how it happened: First, select a model, enter the prompts, do some tweaks on the settings, and click on 'Queue Prompt'. Wait for a few seconds, and the error occurs.

Debug Logs

FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt
model_type EPS
Using split attention in VAE
Using split attention in VAE
loaded straight to GPU
Requested to load BaseModel
Loading 1 new model
Requested to load SD1ClipModel
Loading 1 new model
!!! Exception during processing!!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Traceback (most recent call last):
  File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "C:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 58, in encode
    output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 115, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 567, in encode_token_weights
    out = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights
    o = self.encode(to_encode)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 228, in encode
    return self(tokens)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 200, in forward
    outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 134, in forward
    x = self.text_model(*args, **kwargs)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 109, in forward
    x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 68, in forward
    x = l(x, mask, optimized_attention)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 49, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 16, in forward
    q = self.q_proj(x)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 50, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 46, in forward_comfy_cast_weights
    return torch.nn.functional.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Other

No response

这两天我尝试了不同作者的“segment anything”节点，但是无一列外在复杂一点的工作流中一定会出现“torch.cuda.OutOfMemoryError: Allocation on device”报错，如果只是单独使用这类节点很多时候又是正常的。不知道我遇到的问题是不是和这个错误类似的。 Allocation on device

File "D:\ComfyUI-aki-v1.3\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "D:\ComfyUI-aki-v1.3\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "D:\ComfyUI-aki-v1.3\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 317, in main boxes = groundingdino_predict( File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 182, in groundingdino_predict boxes_filt = get_grounding_output( File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 170, in get_grounding_output outputs = model(image[None], captions=[caption]) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\groundingdino.py", line 303, in forward hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 258, in forward memory, memory_text = self.encoder( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 576, in forward output = checkpoint.checkpoint( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, *kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_dynamo\eval_frame.py", line 451, in _fn return fn(args, kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_dynamo\external_utils.py", line 36, in inner return fn(*args, kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\utils\checkpoint.py", line 487, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\autograd\function.py", line 598, in apply return super().apply(args, kwargs) # type: ignore[misc] File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\utils\checkpoint.py", line 262, in forward outputs = run_function(args) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 785, in forward src2 = self.self_attn( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\ms_deform_attn.py", line 271, in forward output = multi_scale_deformable_attn_pytorch( File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\ms_deform_attn.py", line 70, in multi_scale_deformable_attn_pytorch (torch.stack(sampling_value_list, dim=-2).flatten(-2) attention_weights)

comfyanonymous / ComfyUI