AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.28k stars 26.43k forks source link

[Bug]: regular crashing on my 3090 #16089

Open MODOMison opened 2 months ago

MODOMison commented 2 months ago

Checklist

What happened?

when running a image generation thecomputer will seem to deactivate or "crash"

Steps to reproduce the problem

running a generation, often with adetialer extensio

What should have happened?

continued to produce image without any trouble

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

i didnt have the sysinfo persay at this time

i have other data though take a look

(i hope the photo go in) Screenshot 2024-06-25 013851

the alledged hardware error only happens with this program so maybe incorrect?

Screenshot 2024-06-24 100116

seeming to cause a system an unexpected system "shutdown"

Console logs

due to the crash style the cmd logs are lost

after regular crash i had done a full resinstall of all things
new windows, new driver, new program
after this it got "better" but still happens

Additional information

i have the newest driver and i have physically reinstalled it today after much trouble maybe this will work

SPECT0R1A commented 2 months ago

I got the same issue. AUTOMATIC1111 with 5 or 6 extensions installed, completely turns my PC off after a few image generations.

AlUlkesh commented 2 months ago

Could be a heat issue. DId you check the GPU temperature when you run generations?

MODOMison commented 2 months ago

yes temp seems normal i had a program called speccy running

cpu was around 62 with some spike , oddly reading more one second then back to 62 range the next gpu no trouble like 65

MODOMison commented 2 months ago

this is the link to the nvidia ticket etc. https://forums.developer.nvidia.com/t/ai-art-cause-crash-automatic-11-11-with-3090/297492/1

w-e-w commented 2 months ago

tldr my conclusion is that the issue is either a hardware issue specific to your device, or a software issue from a upstream Library such as PyTorch, either case we can't really help, sorry


did this issue happen all of sudden or is it after you did a specific thing? have you have issues with webui in the past or are you a new user?


not an expert but I have feeling that this is more of a hardware issue possibly a hardware and stability or a partial corruption of memory

if it's hardware stability issue I will try and underclock your GPUs core and memory clock using tools like MSI afterburner I might even give it a little bit more voltage to hopefully make it more stable have a run stress test on your system? I switched around and some people mentioned PSU on Reddit, it is totally possible that your GPU is providing unstable voltage

cpu was around 62 with some spike , oddly reading more one second then back to 62 range the next gpu no trouble like 65

I'm not sure what you're using to monitor the temperature but I would use HWiNFO, it should show more sensors like memory temperature like not just the core, sometimes it could be that your cords are fine but maybe you're VRAM chips have bad thermals

it could also be a portion of your VRAM chip is bad, and during high memory usage applications the bad memory is used and triggers the crash


the thing is I'm also using a 3090 and it's running perfectly fine now the firmware model make and our GPU might be slightly different so it's not really in apple's to apples comparison, but all I'm saying is it should run fine, so it's most likely some sort of software plus your specific Hardware issue that triggers this error

if it's a software issue that should be fixed, then it's most likely not originated from webui but from PyTorch or some other GPU related library, which is out of scope

you could try using a different version of PyTorch maybe it would magically help


i have the newest driver

you said you have the newest driver but have you tested old drivers? new does not equal good / bug free