comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
58.96k stars 6.26k forks source link

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED #4556

Open bbecausereasonss opened 3 months ago

bbecausereasonss commented 3 months ago

Expected Behavior

I'm having a heck of a time finding a working Torch to just work ... I dunno what happened, but I upraded (all) and it borked my install.. now when I try a comy lora/flux workflow that used to work before; I get this error.

Actual Behavior

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED

Steps to Reproduce

Launch workflow. Get error. Flux 1_00052_

Debug Logs

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: C:\Users\xxxx\Deep\Comfy\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
[]
[]
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
C:\Users\xxxx\Deep\Comfy\ComfyUI\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 4777.53759765625 True
clip missing: ['text_projection.weight']
Requested to load FluxClipModel_
Loading 1 new model
C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load Flux
Loading 1 new model
loaded completely 0.0 11350.048889160156 True
  0%|                                                                       | 0/30 [00:00<?, ?it/s]
!!! Exception during processing !!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Ddesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)`
Traceback (most recent call last):
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\execution.py", line 317, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\execution.py", line 192, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 612, in sample
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\k_diffusion\sampling.py", line 144, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\custom_nodes\ComfyUI-TiledDiffusion\.patches.py", line 4, in calc_cond_batch
    return calc_cond_batch_original_tiled_diffusion_1bb5a55e(model, conds, x_in, timestep, model_options)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\samplers.py", line 228, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\custom_nodes\ComfyUI-Advanced-ControlNet\adv_control\utils.py", line 68, in apply_model_uncond_cleanup_wrapper
    return orig_apply_model(self, *args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\model_base.py", line 142, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\ldm\flux\model.py", line 159, in forward
    out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\ldm\flux\model.py", line 104, in forward_orig
    img = self.img_in(img)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\ops.py", line 72, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\ops.py", line 291, in forward_comfy_cast_weights
    out = fp8_linear(self, input)
  File "C:\Users\xxxx\Deep\Comfy\ComfyUI\comfy\ops.py", line 273, in fp8_linear
    o = torch._scaled_mm(inn, w, out_dtype=input.dtype, bias=bias, scale_a=scale_input, scale_b=scale_weight)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Ddesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)`

Prompt executed in 74.91 seconds

Other

No response

ltdrdata commented 3 months ago

Do you use ZLUDA? https://github.com/comfyanonymous/ComfyUI/issues/4132

benzhangdragonplus commented 1 month ago

The same problem is with my UNet loader. When loading flux fp8, selecting 'default' as the pruning type is normal. However, when selecting fp8_e4m3fn

an error message will be generated: Runtime Error: CUDA error: CUBLASVNet NOT_SUPPORTED when calling 'cublasLtMatmulAlgoGetHeuristic' (ltHandle, computeDesc. scriptor(), Adesc. scriptor(), Bdesc. scriptor(), Cdesc. scriptor(), Ddesc. scriptor(), preference. scriptor(), 1,&heuristicResult,&returnedResult)`

pytorch version: 2.4.0+cu118 xformers version: 0.0.27.post2+cu118 Stable version: 5f9d5a24

ltdrdata commented 1 month ago

cublasLtMatmulAlgoGetHeuristic' (ltHandle, computeDesc. scriptor(), Adesc. scriptor(), Bdesc. scriptor(), Cdesc. scriptor(), Ddesc. scriptor(), preference. scriptor(), 1,&heuristicResult,&returnedResult)`

It seems that you need to upgrade your pytorch 2.5.0+cu124