Closed TCNOco closed 1 year ago
Experiencing the same with with a 3090:
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Commit hash: ff6a5bcec1ce25aa8f08b157ea957d764be23d8d Installing requirements for Web UI Installing requirements for scikit_learn
####################################################################################################### Initializing Dreambooth If submitting an issue on github, please provide the below text for debugging purposes:
Python revision: 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Dreambooth revision: 17c3864803ebb50615205271de687be96cfc96e8 SD-WebUI revision: ff6a5bcec1ce25aa8f08b157ea957d764be23d8d
Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers version 0.10.2 installed. [+] transformers version 4.25.1 installed. [+] xformers version 0.0.14.dev0 installed. [+] torch version 1.12.1+cu116 installed. [+] torchvision version 0.13.1+cu116 installed.
Further testing, the crash happens very reliably even without xformers:
[!] xformers NOT installed.
...
To create a public link, set `share=True` in `launch()`.
20%|████████████████▌ | 4/20 [00:05<00:21, 1.36s/it]
Error completing request███████ | 3/20 [00:00<00:01, 15.78it/s]
Arguments: ('task(miihw76ip8t7qjt)', 'vaporwave', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, False, 'Denoised', 5.0, 0.0, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '', '', False, False, False, False, '', 10.0, True, 30.0, True, 'svg', True, True, False, 0.5, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
processed = process_images(p)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
res = process_images_inner(p)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 608, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 797, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 445, in launch_sampling
return func()
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 542, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_samplers.py", line 337, in forward
x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 781, in forward
h = module(h, emb, context)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 82, in forward
x = layer(x, emb)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\sd_hijack_checkpoint.py", line 10, in ResBlock_forward
return checkpoint(self._forward, x, emb)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 235, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 96, in forward
outputs = run_function(*args)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 262, in _forward
h = self.in_layers(x)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
input = module(input)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 219, in forward
return super().forward(x.float()).type(x.dtype)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\normalization.py", line 272, in forward
return F.group_norm(
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py", line 2516, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I have also installed and moved different versions of CUDA to the start of my PATH, so they're used in the program. 11.8, 11.7, 11.6 and 11.3 all have the same issue for me.
Windows updated last night and still nothing different.
--
As mentioned on a different thread I even downloaded CUDNN and dropped the DLL files into venv\Lib\site-packages\torch\lib
, same issue. I have since rolled this change back, and even deleted venv completely in a reinstall attempt.
Thought I'd try my hand at training in the Dreambooth tab with that extension... Crashes here as well just after Preparing Dataset
Traceback (most recent call last):
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 561, in start_training
result = main(config, use_txt2img=use_txt2img)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 973, in main
return inner_loop()
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 116, in decorator
return function(batch_size, grad_size, prof, *args, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 861, in inner_loop
accelerator.backward(loss)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 146, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 399, in wrapper
outputs = fn(ctx, *args)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 111, in backward
grads = _memory_efficient_attention_backward(
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 382, in _memory_efficient_attention_backward
grads = op.apply(ctx, inp, grad)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\cutlass.py", line 184, in apply
(grad_q, grad_k, grad_v,) = cls.OPERATOR(
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps: 0%| | 0/191100 [00:00<?, ?it/s]
Training completed, reloading SD Model.
Restored system models.
Returning result: Exception training model: 'CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.'.
@auraria A temporary solution going off a hunch from my first post... Reinstalling the latest Studio Drivers from Nvidia (and not restarting my PC) seems to make it works again. Do you experience similar results?
Just select your OS, but make sure Studio Driver (SD) is selected. Open the installer, Tick Clean Install, and let it install/reinstall Studio Drivers.
Open SD without restarting and things seem to work fine.
I can push batch size to 4 and still no crash. Reinstalling xformers, it works good as well.
Sorry to ask but how do you install WebUI? Because my archaic 1070ti works just fine with custom xformers and cudnn 11.7 with optimal speed on 8 batch for inference.
@auraria A temporary solution going off a hunch from my first post... Reinstalling the latest Studio Drivers from Nvidia (and not restarting my PC) seems to make it works again. Do you experience similar results?
Just select your OS, but make sure Studio Driver (SD) is selected. Open the installer, Tick Clean Install, and let it install/reinstall Studio Drivers.
Open SD without restarting and things seem to work fine.
I can push batch size to 4 and still no crash. Reinstalling xformers, it works good as well.
I have the same problem si I thought of that yesterday, switched the drivers to the studio one and I still have the error. No problem with anything expect when I try to train, where it will give me the CUDA error
I have tried DDU once again. Ctrl+Start+Shift+B to reset the graphics driver, and that does seem to help... But the crash comes around again. Was happy with 512x512 xformers generations, but I cranked it up to see how far it would go... Seems to be some kind of memory thing? I don't really know and it's super sad me and a handful of other people with powerful expensive hardware are seemingly locked out.
Closing as stale.
Is there an existing issue for this?
What happened?
This started happening on a recent commit. I rolled back to the start of Jan, and it still happens. Nvidia 3080 Ti. More than enough VRAM. Adding
--xformers
seemed to help, but same issue.What solved it (very temporarily) was installing the studio drivers for nvidia, worked fine until restart... Now it's back.
I am nowhere near running out of VRAM, and seems to freeze my entire display/s for a second or two before the error pops up.
I have tried disabling addons. I have tried DDU and a fresh driver install. Heck, I even tried cloning a new version of SDUI, reinstalling Python to be EXACTLY what the repo says, and running just plain old SD 1.5... Same exact issue, with NO modifications.
My Nvidia drivers are stock. Windows 11, stock.
Steps to reproduce the problem
--xformers
,--xformers --no-half
, and a few others. Same issuevaporwave
. Some models generate one image, then SDUI breaks, some don't complete one.Disabling live previews seemed to help, but after generating 2 images it failed with the same of almost exactly the same errors...
What should have happened?
An image is generated
Commit where the problem happens
ff6a5bcec1ce25aa8f08b157ea957d764be23d8d
What platforms do you use to access UI ?
Windows
What browsers do you use to access the UI ?
Mozilla Firefox, Google Chrome
Command Line Arguments
Additional information, context and logs
If I simply reload the UI from the settings menu, I get this error and ABSOLUTELY NOTHING happens. nothing is generated: