AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
142.91k stars 26.94k forks source link

[Bug]: RuntimeError: CUDA error: the launch timed out and was terminated #6790

Closed TCNOco closed 1 year ago

TCNOco commented 1 year ago

Is there an existing issue for this?

What happened?

This started happening on a recent commit. I rolled back to the start of Jan, and it still happens. Nvidia 3080 Ti. More than enough VRAM. Adding --xformers seemed to help, but same issue.

What solved it (very temporarily) was installing the studio drivers for nvidia, worked fine until restart... Now it's back.

I am nowhere near running out of VRAM, and seems to freeze my entire display/s for a second or two before the error pops up.

I have tried disabling addons. I have tried DDU and a fresh driver install. Heck, I even tried cloning a new version of SDUI, reinstalling Python to be EXACTLY what the repo says, and running just plain old SD 1.5... Same exact issue, with NO modifications.

My Nvidia drivers are stock. Windows 11, stock.

Steps to reproduce the problem

  1. Start SDUI. I've tried no args, --xformers, --xformers --no-half, and a few others. Same issue
  2. Enter literally anything and click generate. I tried just vaporwave. Some models generate one image, then SDUI breaks, some don't complete one.

Disabling live previews seemed to help, but after generating 2 images it failed with the same of almost exactly the same errors...

What should have happened?

An image is generated

Commit where the problem happens

ff6a5bcec1ce25aa8f08b157ea957d764be23d8d

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox, Google Chrome

Command Line Arguments

None, `--xformers`, `--xformers --no-half`.

Additional information, context and logs

To create a public link, set `share=True` in `launch()`.
 30%|████████████████████████▉                                                          | 6/20 [00:05<00:11,  1.19it/s]
Error completing request█████████████████                                               | 6/20 [00:00<00:00, 17.01it/s]
Arguments: ('task(d7ww7dory5fc98w)', 'floral marble, solo, gradient background, gradient, vaporwave', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, 'Denoised', 5.0, 0.0, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '', '', False, False, False, False, '', 10.0, True, 30.0, True, 'svg', True, True, False, 0.5, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 608, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 797, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 445, in launch_sampling
    return func()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 146, in sample_euler_ancestral
    sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 56, in get_ancestral_step
    sigma_up = min(sigma_to, eta * (sigma_to ** 2 * (sigma_from ** 2 - sigma_to ** 2) / sigma_from ** 2) ** 0.5)
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
To create a public link, set `share=True` in `launch()`.
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.77it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 13.02it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 14.00it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 12.65it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 14.38it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 12.83it/s]
 15%|████████████▍                                                                      | 3/20 [00:03<00:18,  1.06s/it]
Error completing request███▋                                                            | 2/20 [00:00<00:01, 11.30it/s]
Arguments: ('task(2bab39ucil7i5ck)', 'floral marble, solo, gradient background, gradient, vaporwave', '', 'None', 'None', 20, 0, False, False, 1, 4, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, 'Denoised', 5.0, 0.0, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '', '', False, False, False, False, '', 10.0, True, 30.0, True, 'svg', True, True, False, 0.5, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 608, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 797, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 445, in launch_sampling
    return func()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 337, in forward
    x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 781, in forward
    h = module(h, emb, context)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 324, in forward
    x = block(x, context=context[i])
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_hijack_checkpoint.py", line 4, in BasicTransformerBlock_forward
    return checkpoint(self._forward, x, context)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 264, in _forward
    x = self.ff(self.norm3(x)) + x
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 73, in forward
    return self.net(x)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
    input = module(input)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 52, in forward
    x, gate = self.proj(x).chunk(2, dim=-1)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
To create a public link, set `share=True` in `launch()`.
 95%|█████████████████████████████████████████████████████████████████████████████▉    | 19/20 [00:06<00:00,  3.06it/s]
Error completing request████████████████████████████████████████████████████████▍      | 18/20 [00:01<00:00, 13.06it/s]
Arguments: ('task(1r9jvc6ldqf7uf1)', 'floral marble, solo, gradient background, gradient, vaporwave', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, 'Denoised', 5.0, 0.0, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '', '', False, False, False, False, '', 10.0, True, 30.0, True, 'svg', True, True, False, 0.5, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 608, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 797, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 445, in launch_sampling
    return func()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 337, in forward
    x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 781, in forward
    h = module(h, emb, context)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 82, in forward
    x = layer(x, emb)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_hijack_checkpoint.py", line 10, in ResBlock_forward
    return checkpoint(self._forward, x, emb)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 78, in forward
    ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 44, in get_device_states
    fwd_gpu_states.append(torch.cuda.get_rng_state())
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\random.py", line 31, in get_rng_state
    return default_generator.get_state()
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Total progress:  95%|██████████████████████████████████████████████████████████████▋   | 19/20 [00:17<00:00, 13.06it/s]

If I simply reload the UI from the settings menu, I get this error and ABSOLUTELY NOTHING happens. nothing is generated:

To create a public link, set `share=True` in `launch()`.
Error completing request
Arguments: ('task(40pf1k31mvr5s1b)', 'floral marble, solo, gradient background, gradient, vaporwave', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, 'Denoised', 5.0, 0.0, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '', '', False, False, False, False, '', 10.0, True, 30.0, True, 'svg', True, True, False, 0.5, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 33, in f
    shared.state.begin()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\shared.py", line 219, in begin
    devices.torch_gc()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\devices.py", line 59, in torch_gc
    torch.cuda.empty_cache()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 121, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
auraria commented 1 year ago

Experiencing the same with with a 3090:

Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Commit hash: ff6a5bcec1ce25aa8f08b157ea957d764be23d8d Installing requirements for Web UI Installing requirements for scikit_learn

####################################################################################################### Initializing Dreambooth If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Dreambooth revision: 17c3864803ebb50615205271de687be96cfc96e8 SD-WebUI revision: ff6a5bcec1ce25aa8f08b157ea957d764be23d8d

Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers version 0.10.2 installed. [+] transformers version 4.25.1 installed. [+] xformers version 0.0.14.dev0 installed. [+] torch version 1.12.1+cu116 installed. [+] torchvision version 0.13.1+cu116 installed.

TCNOco commented 1 year ago

Further testing, the crash happens very reliably even without xformers:

[!] xformers NOT installed.
...
To create a public link, set `share=True` in `launch()`.
 20%|████████████████▌                                                                  | 4/20 [00:05<00:21,  1.36s/it]
Error completing request███████                                                         | 3/20 [00:00<00:01, 15.78it/s]
Arguments: ('task(miihw76ip8t7qjt)', 'vaporwave', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, 'Denoised', 5.0, 0.0, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '', '', False, False, False, False, '', 10.0, True, 30.0, True, 'svg', True, True, False, 0.5, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 608, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 797, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 445, in launch_sampling
    return func()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 337, in forward
    x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 781, in forward
    h = module(h, emb, context)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 82, in forward
    x = layer(x, emb)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_hijack_checkpoint.py", line 10, in ResBlock_forward
    return checkpoint(self._forward, x, emb)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
    outputs = run_function(*args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 262, in _forward
    h = self.in_layers(x)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
    input = module(input)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 219, in forward
    return super().forward(x.float()).type(x.dtype)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\normalization.py", line 272, in forward
    return F.group_norm(
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I have also installed and moved different versions of CUDA to the start of my PATH, so they're used in the program. 11.8, 11.7, 11.6 and 11.3 all have the same issue for me.

Windows updated last night and still nothing different.

--

As mentioned on a different thread I even downloaded CUDNN and dropped the DLL files into venv\Lib\site-packages\torch\lib, same issue. I have since rolled this change back, and even deleted venv completely in a reinstall attempt.

Thought I'd try my hand at training in the Dreambooth tab with that extension... Crashes here as well just after Preparing Dataset

Traceback (most recent call last):
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 561, in start_training
    result = main(config, use_txt2img=use_txt2img)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 973, in main
    return inner_loop()
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 116, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 861, in inner_loop
    accelerator.backward(loss)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 1314, in backward
    self.scaler.scale(loss).backward(**kwargs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
    return user_fn(self, *args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 146, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
    return user_fn(self, *args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 399, in wrapper
    outputs = fn(ctx, *args)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 111, in backward
    grads = _memory_efficient_attention_backward(
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 382, in _memory_efficient_attention_backward
    grads = op.apply(ctx, inp, grad)
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 184, in apply
    (grad_q, grad_k, grad_v,) = cls.OPERATOR(
  File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps:   0%|                                                                                                                   | 0/191100 [00:00<?, ?it/s]
Training completed, reloading SD Model.
Restored system models.
Returning result: Exception training model: 'CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.'.
TCNOco commented 1 year ago

@auraria A temporary solution going off a hunch from my first post... Reinstalling the latest Studio Drivers from Nvidia (and not restarting my PC) seems to make it works again. Do you experience similar results?

Nvidia driver downloads

Just select your OS, but make sure Studio Driver (SD) is selected. Open the installer, Tick Clean Install, and let it install/reinstall Studio Drivers.

Open SD without restarting and things seem to work fine.

I can push batch size to 4 and still no crash. Reinstalling xformers, it works good as well.

mclsugi commented 1 year ago

Sorry to ask but how do you install WebUI? Because my archaic 1070ti works just fine with custom xformers and cudnn 11.7 with optimal speed on 8 batch for inference.

goldo72 commented 1 year ago

@auraria A temporary solution going off a hunch from my first post... Reinstalling the latest Studio Drivers from Nvidia (and not restarting my PC) seems to make it works again. Do you experience similar results?

Nvidia driver downloads

Just select your OS, but make sure Studio Driver (SD) is selected. Open the installer, Tick Clean Install, and let it install/reinstall Studio Drivers.

Open SD without restarting and things seem to work fine.

I can push batch size to 4 and still no crash. Reinstalling xformers, it works good as well.

I have the same problem si I thought of that yesterday, switched the drivers to the studio one and I still have the error. No problem with anything expect when I try to train, where it will give me the CUDA error

TCNOco commented 1 year ago

I have tried DDU once again. Ctrl+Start+Shift+B to reset the graphics driver, and that does seem to help... But the crash comes around again. Was happy with 512x512 xformers generations, but I cranked it up to see how far it would go... Seems to be some kind of memory thing? I don't really know and it's super sad me and a handful of other people with powerful expensive hardware are seemingly locked out.

catboxanon commented 1 year ago

Closing as stale.