AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.61k stars 27.03k forks source link

[Bug]: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED and CUDA error: the launch timed out #7075

Open TCNOco opened 1 year ago

TCNOco commented 1 year ago

Is there an existing issue for this?

What happened?

All monitors freeze momentarily, mouse as well, sometimes dip black and come back, and after the system resumes, the program reports a crash and stops generating new images.

This happens every time and stops me using it completely. I'm lucky if I can generate 1 512x512 image on a 3080 Ti.

I touched on this in #6790, but it has since drowned in the sea of issues. In the meantime I learned how to enable better logging for this, part of it was literally handed to me as the last line.

Steps to reproduce the problem

  1. Start SDUI. Same issue happens when using --xformers as not using it. Nothing helps.
  2. Enter anything and click generate. SDUI breaks and does not even try generating further images.

What should have happened?

Images are generated

Commit where the problem happens

ff6a5bcec1ce25aa8f08b157ea957d764be23d8d

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox, Google Chrome, Brave, Microsoft Edge

Command Line Arguments

No response

Additional information, context and logs

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Traceback (most recent call last):
  File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 610, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 610, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 408, in decode_first_stage
    x = model.decode_first_stage(x)
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 141, in forward
    h = self.conv2(h)
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 512, 192, 192], dtype=torch.half, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
    memory_format = Contiguous
    data_type = CUDNN_DATA_HALF
    padding = [1, 1, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0000029A821CD810
    type = CUDNN_DATA_HALF
    nbDims = 4
    dimA = 1, 512, 192, 192,
    strideA = 18874368, 36864, 192, 1,
output: TensorDescriptor 0000029A821CDE30
    type = CUDNN_DATA_HALF
    nbDims = 4
    dimA = 1, 512, 192, 192,
    strideA = 18874368, 36864, 192, 1,
weight: FilterDescriptor 0000029A637F5930
    type = CUDNN_DATA_HALF
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 512, 512, 3, 3,
Pointer addresses:
    input: 0000001ED8800000
    output: 0000001EDD000000
    weight: 0000001E7EE00000
Forward algorithm: 1

Any further generation attempts result in the following until restart:

Traceback (most recent call last):
  File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 33, in f
    shared.state.begin()
  File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\shared.py", line 219, in begin
    devices.torch_gc()
  File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\devices.py", line 59, in torch_gc
    torch.cuda.empty_cache()
  File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\cuda\memory.py", line 125, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated

Heck, I have even tried:

TCNOco commented 1 year ago

I am able to generate images, albeit twice as slow using --no-half --precision full. Using --xformers does help.

Currently messing around happily in SDUI using the option set: --opt-split-attention --no-half-vae --medvram --always-batch-cond-uncond, and it seems to work just fine. The medvram option seems to help a lot with this issue. Even though I don't run out of VRAM (why would that happen on 1 512x512 image anyways)

mezotaken commented 1 year ago

So. Did you try suggested code snippet? Did it cause the error? If it causes the same error it means that something is wrong with your pytorch installation/cuda installation and there's nothing we can do except installing it properly (but how?)

to run it, create snippet.txt file in your sdwebui folder, paste code in there, change filename extension to .py Then open cmd, navigate to your sdwebui using cd command. run venv\Scripts\activate.bat run python snippet.py

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 512, 192, 192], dtype=torch.half, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
TCNOco commented 1 year ago

No issue running the script. Thought nothing happened, so I added a little print at the beginning and end, yet everything worked fine with the snippet.

Code_XkkbZhC497

mezotaken commented 1 year ago

Welp, now i have no idea what to do. Looks like some internal CUDA error with half precision operations.

TCNOco commented 1 year ago

Sadly, same boat... I can thankfully generate images with the arguments to 'nerf' it, and they look just as good to me... Just sad I'm giving up performance for no reason, or have to face endless crashes for no reason.

mezotaken commented 1 year ago

Do you have access to a different GPU? Like a friend's or smth? It may be caused by faulty one or overclocking.

TCNOco commented 1 year ago

Unfortunately not, I would assume it works fine... But as in #6790 - I am far from alone. I was running it undervolted with stock BIOS and everything, turned it off and had NO modifications - not even a higher voltage limit, same issue. The issues above were after a clean reboot following a complete CUDA and Nvidia driver uninstallation and DDU session, right after installing the Studio drivers and 11.7 CUDA. Same exact issue.

I will clean my PC today and hopefully that will change something, but I highly doubt it. Really doubt it has to do with heat

mezotaken commented 1 year ago

What i might suggest if you can't swap GPU - try another repo. InvokeAI, for example. If it will break in the same way, there's nothing we can do here.

tanmay4269 commented 3 months ago

I do this: torch.backends.cudnn.enabled = False and it stops showing the error, I assume I'm doing it at the cost of quality of computation but can someone clarify why this works and how bad is it to use this?