Error for SDXL ControlNet on older GPUs (980ti)

I've been following this issue, reported here: https://github.com/comfyanonymous/ComfyUI/issues/1289

Reported as Resolved, but I still receive the same error when running on my 980ti (6gb vram).

Notes: 1) Performed all updates, reboot, re-run 2) SDXL Models (Lora/VAE) = working fine on same card 3) ControlNet 1.5 Models = working fine on same card ( depth, canny, openpose, etc tested & using regularly ) 4) ONLY SDXL failure when using these new models 5) Running on multiple PCs and newer cards (also more vram) work fine with these new models

Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 2048 and using 20 heads. 0%| | 0/20 [00:00<?, ?it/s] !!! Exception during processing !!! Traceback (most recent call last): File "c:\AI\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "c:\AI\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "c:\AI\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "c:\AI\ComfyUI\nodes.py", line 1206, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "c:\AI\ComfyUI\nodes.py", line 1176, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "C:\AI\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\hacky.py", line 22, in informative_sample raise e File "C:\AI\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\hacky.py", line 9, in informative_sample return original_sample(*args, *kwargs) File "c:\AI\ComfyUI\comfy\sample.py", line 93, in sample samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "c:\AI\ComfyUI\comfy\samplers.py", line 733, in sample samples = getattr(k_diffusionsampling, "sample{}".format(self.sampler))(self.model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar) File "C:\AI\ComfyUI.venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "c:\AI\ComfyUI\comfy\k_diffusion\sampling.py", line 137, in sample_euler denoised = model(x, sigma_hat * s_in, extra_args) File "C:\AI\ComfyUI.venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "c:\AI\ComfyUI\comfy\samplers.py", line 323, in forward out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, cond_concat=cond_concat, model_options=model_options, seed=seed) File "C:\AI\ComfyUI.venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "c:\AI\ComfyUI\comfy\k_diffusion\external.py", line 125, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), kwargs) File "c:\AI\ComfyUI\comfy\k_diffusion\external.py", line 151, in get_eps return self.inner_model.apply_model(*args, *kwargs) File "c:\AI\ComfyUI\comfy\samplers.py", line 311, in apply_model out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, cond_concat, model_options=model_options, seed=seed) File "c:\AI\ComfyUI\comfy\samplers.py", line 289, in sampling_function cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, cond_concat, model_options) File "c:\AI\ComfyUI\comfy\samplers.py", line 241, in calc_cond_uncond_batch c['control'] = control.get_control(inputx, timestep, c, len(cond_or_uncond)) File "c:\AI\ComfyUI\comfy\sd.py", line 813, in get_control if control_prev is not None: File "C:\AI\ComfyUI.venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "c:\AI\ComfyUI\comfy\cldm\cldm.py", line 283, in forward emb = self.time_embed(t_emb) File "C:\AI\ComfyUI.venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "C:\AI\ComfyUI.venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "C:\AI\ComfyUI.venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "c:\AI\ComfyUI\comfy\sd.py", line 866, in forward return c RuntimeError: self and mat2 must have the same dtype

So right after I posted this I tried downloading the pre-made examples: depth, canny, sketch, recolor (huggingface) Example from: https://huggingface.co/stabilityai/control-lora/tree/main/comfy-control-LoRA-workflows

Depth is now runs through ksampler, but errors out on decode. So perhaps this is due to memory (error, below).

Very confused by this, as the exact same .json that I created (stabilityAI tutorial posted yesterday) processes and runs on a newer GPU (3-series 12gb vram) but generates the (above) "self and mat2 must have the same dtype" error on the 980ti card.

Error occurred when executing VAEDecode:

CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

File "c:\AI\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "c:\AI\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "c:\AI\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "c:\AI\ComfyUI\nodes.py", line 241, in decode return (vae.decode(samples["samples"]), ) File "c:\AI\ComfyUI\comfy\sd.py", line 669, in decode pixel_samples[x:x+batch_number] = torch.clamp((self.first_stage_model.decode(samples) + 1.0) / 2.0, min=0.0, max=1.0).cpu().float()

comfyanonymous / ComfyUI

Error for SDXL ControlNet on older GPUs (980ti) #1303