AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
138.45k stars 26.3k forks source link

[Bug]: Full computer crash when generating larger images #15219

Open OwOchle opened 5 months ago

OwOchle commented 5 months ago

Checklist

What happened?

When trying to create images bigger than 512x512 (i can go larger, but it's likely to crash), my pc fully crashes. sometimes instantaneously, sometimes during the compute. The debugging on my side is very complicated as linux mark logs as corrupted (pc instant crash, not even system panic i think). I did multiple stress test of my pc so it's unlikely a psu problem.

Steps to reproduce the problem

  1. txt2img
  2. Resolution of 1024x1024 (almost certain to crash)

What should have happened?

Maybe visual lags, but no crashes

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2024-03-11-14-16.json

Console logs

glibc version is 2.39
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Python 3.10.13 (main, Mar  9 2024, 23:21:18) [GCC 13.2.1 20230801]
Version: v1.8.0
Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5
Launching Web UI with arguments: 
WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.2.0+cu121 with CUDA 1201 (you have 2.3.0.dev20240309+rocm6.0)
    Python  3.10.13 (you have 3.10.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
No module 'xformers'. Proceeding without it.
Loading weights [6d15e4ac22] from /nix/other/ai/stable-diffusion-webui/models/Stable-diffusion/furworldFurryYiffNSFW_hardfurry.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.0s (prepare environment: 5.7s, import torch: 3.4s, import gradio: 0.8s, setup paths: 1.6s, other imports: 0.6s, load scripts: 0.4s, initialize extra networks: 0.1s, create ui: 0.5s, gradio launch: 1.7s).
Creating model from config: /nix/other/ai/stable-diffusion-webui/configs/v1-inference.yaml
Applying attention optimization: Doggettx... done.
Model loaded in 6.4s (load weights from disk: 2.7s, create model: 0.3s, apply weights to model: 2.9s, calculate empty prompt: 0.4s).
MIOpen(HIP): Error [Prefetch] Ill-formed record: key not found: /home/moreo/.config/miopen//gfx1030_36.HIP.3_00_0_f34a90f76-dirty.ufdb.txt#396
MIOpen(HIP): Error [Prefetch] Ill-formed record: key not found: /home/moreo/.config/miopen//gfx1030_36.HIP.3_00_0_f34a90f76-dirty.ufdb.txt#396
Reusing loaded model furworldFurryYiffNSFW_hardfurry.safetensors [6d15e4ac22] to load Anything-V3.0.safetensors [10f0bd7ade]
Loading weights [10f0bd7ade] from /nix/other/ai/stable-diffusion-webui/models/Stable-diffusion/Anything-V3.0.safetensors
Applying attention optimization: Doggettx... done.

Additional information

No response

w-e-w commented 5 months ago

Tl;Dr: I have no clue, can't help


long version if you want to read

there's nothing fundamentally different about what web UI does when generating a 512x512 image vs a 1024x1024 image the only real difference is how much the hardware is utilized and how long it's under load

in general it is quite hard to crash your computer completely and this type of issue is hard enough to debug even with physical access to your PC it becomes impossible to debug with just error logs that shows no real errors

if you want my hunch my guess is that your GPU hardware is bad maybe a combination of core utilization or memory load could be anything really have you monitor your system's temperature

I did multiple stress test of my pc so it's unlikely a psu problem.

what stress test is your system CPU RAM GPU VRAM all at 100% at the same time stress test doesn't necessary tell you a lot it only tells you that it can pass a particular stress test lots of things can go wrong it's not necessary happened at 100% load

for example in the past I have experience undervolting my CPU and GPU, my system can run af full load no issue but but suddenly crashes when stress test end idling down

I will try using overclocking utility to underclock or add some voltage to your GPU (and CPU for that matter) to see if it's some sort of voltage instability monitoring the temperature to see if there is any anomalies

there a pattern to when the crash happens also can you describe when at what stage is the image gen at?


irrc I remember seeing a story of someone on internet about their PC keeps crashing when playing a game they pretty much do anything including swapping hardware at the end they found that for some reason that GPU paired with that motherboard will crash in certain games but the GPU seems fine with other motherboard like Hardware is just weird

OwOchle commented 5 months ago

my thermals are good, the vram usage seems to spike up, causing the crash, but i could not find a reliable way to measure that. The stress test was a full stress test with like, CPU, GPU, RAM and VRAM stressed. I tried using an overclocking utility to limit my gpu power, but without success. I guess it's just bad luck, someone pointed me out ReSize Bar was maybe the issue, but it did nothing (just so you know, maybe other users might have problem with that).

w-e-w commented 5 months ago

I can only wish you good luck

AlcantaraMC commented 5 months ago

my thermals are good, the vram usage seems to spike up, causing the crash, but i could not find a reliable way to measure that. The stress test was a full stress test with like, CPU, GPU, RAM and VRAM stressed. I tried using an overclocking utility to limit my gpu power, but without success. I guess it's just bad luck, someone pointed me out ReSize Bar was maybe the issue, but it did nothing (just so you know, maybe other users might have problem with that).

VRAM usage spike is normal, as the loaded model begins its inference of your prompt.