Open TCNOco opened 1 year ago
I am able to generate images, albeit twice as slow using --no-half --precision full
.
Using --xformers
does help.
Currently messing around happily in SDUI using the option set: --opt-split-attention --no-half-vae --medvram --always-batch-cond-uncond
, and it seems to work just fine. The medvram
option seems to help a lot with this issue. Even though I don't run out of VRAM (why would that happen on 1 512x512 image anyways)
So. Did you try suggested code snippet? Did it cause the error? If it causes the same error it means that something is wrong with your pytorch installation/cuda installation and there's nothing we can do except installing it properly (but how?)
to run it, create snippet.txt file in your sdwebui folder, paste code in there, change filename extension to .py
Then open cmd, navigate to your sdwebui using cd
command.
run venv\Scripts\activate.bat
run python snippet.py
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 512, 192, 192], dtype=torch.half, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
No issue running the script. Thought nothing happened, so I added a little print at the beginning and end, yet everything worked fine with the snippet.
Welp, now i have no idea what to do. Looks like some internal CUDA error with half precision operations.
Sadly, same boat... I can thankfully generate images with the arguments to 'nerf' it, and they look just as good to me... Just sad I'm giving up performance for no reason, or have to face endless crashes for no reason.
Do you have access to a different GPU? Like a friend's or smth? It may be caused by faulty one or overclocking.
Unfortunately not, I would assume it works fine... But as in #6790 - I am far from alone. I was running it undervolted with stock BIOS and everything, turned it off and had NO modifications - not even a higher voltage limit, same issue. The issues above were after a clean reboot following a complete CUDA and Nvidia driver uninstallation and DDU session, right after installing the Studio drivers and 11.7 CUDA. Same exact issue.
I will clean my PC today and hopefully that will change something, but I highly doubt it. Really doubt it has to do with heat
What i might suggest if you can't swap GPU - try another repo. InvokeAI, for example. If it will break in the same way, there's nothing we can do here.
I do this: torch.backends.cudnn.enabled = False
and it stops showing the error, I assume I'm doing it at the cost of quality of computation but can someone clarify why this works and how bad is it to use this?
Is there an existing issue for this?
What happened?
All monitors freeze momentarily, mouse as well, sometimes dip black and come back, and after the system resumes, the program reports a crash and stops generating new images.
This happens every time and stops me using it completely. I'm lucky if I can generate 1 512x512 image on a 3080 Ti.
I touched on this in #6790, but it has since drowned in the sea of issues. In the meantime I learned how to enable better logging for this, part of it was literally handed to me as the last line.
Steps to reproduce the problem
--xformers
as not using it. Nothing helps.What should have happened?
Images are generated
Commit where the problem happens
ff6a5bcec1ce25aa8f08b157ea957d764be23d8d
What platforms do you use to access UI ?
Windows
What browsers do you use to access the UI ?
Mozilla Firefox, Google Chrome, Brave, Microsoft Edge
Command Line Arguments
No response
Additional information, context and logs
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Any further generation attempts result in the following until restart:
Heck, I have even tried: