AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.64k stars 26.47k forks source link

[Bug]: PC shuts down unceremoniously when trying to generate images #15226

Open AlcantaraMC opened 6 months ago

AlcantaraMC commented 6 months ago

Checklist

What happened?

Hi. Recently I have experienced my PC just shutting down when using A1111. When checking the Event Viewer, I see one Critical Level Entry with the message "Error Code 41: "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

Here are my PC Specs:

OS: Windows 11 and Debian 12 (dual-boot) CPU: AMD Ryzen™ 5 3600 GPU: NVIDIA GeForce RTX™ ASUS Megalodon 3070 RAM: 16GB 3600MHz Corsair DDR5 Mobo: Gigabyte Technology Co., Ltd. B550 AORUS ELITE AX V2 PSU: NZXT 850w Gold Rated

My temp monitors do not catch any above-normal heat readings prior to the shutdown. Here is my last reading: https://drive.google.com/file/d/1T0yc71N2m6bfGfCo8LonEod8Ishx5ItF/view?usp=sharing

=============================================================================

There is one interesting Information Level Event right before the Critical one, which says:

ACPI thermal zone _TZ.UAD0 has been enumerated.

_PSV = 290K

_TC1 = 0

_TC2 = 0

_TSP = 1000ms

_AC0 = 0K

_AC1 = 0K

_AC2 = 0K

_AC3 = 0K

_AC4 = 0K

_AC5 = 0K

_AC6 = 0K

_AC7 = 0K

_AC8 = 0K

_AC9 = 0K

_CRT = 294K

_HOT = 293K

minimum throttle = 0

_CR3 = 0K

=====================================================================

What is the probable problem? Thanks!

Steps to reproduce the problem

  1. Start the webui and open the webui interface.
  2. Load ANY model (it does not matter what model you load, it will crash)
  3. Type ANY prompt (my pc actually crashed when i prompted "cat" haha)
  4. Computer might generate successfully a few times before going black, there are times the "Generate" button becomes synonymous to "Shut Down" LMAO. When it goes black, you will hear a tick sound, and your fans will stop rotating right off. However, power button light is on.

What should have happened?

Generate images.

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

{ "Platform": "Windows-10-10.0.22631-SP0", "Python": "3.10.6", "Version": "1.8.0-RC", "Commit": "", "Script path": "C:\Users\\Downloads\stable-diffusion-webui-master\stable-diffusion-webui-master", "Data path": "C:\Users\\Downloads\stable-diffusion-webui-master\stable-diffusion-webui-master", "Extensions dir": "C:\Users\*****\Downloads\stable-diffusion-webui-master\stable-diffusion-webui-master\extensions", "Checksum": "6c73b970beebe1f3d2fe7b8801b96f916af660b2cdd632bac37e461b443e180f", "Commandline": [ "launch.py" ], "Torch env info": { "torch_version": "2.1.2+cu121", "is_debug_build": "False", "cuda_compiled_version": "12.1", "gcc_version": null, "clang_version": null, "cmake_version": null, "os": "Microsoft Windows 11 Pro", "libc_version": "N/A", "python_version": "3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)", "python_platform": "Windows-10-10.0.22631-SP0", "is_cuda_available": "True", "cuda_runtime_version": null, "cuda_module_loading": "LAZY", "nvidia_driver_version": "551.61", "nvidia_gpu_models": "GPU 0: NVIDIA GeForce RTX 3070", "cudnn_version": null, "pip_version": "pip3", "pip_packages": [ "numpy==1.26.2", "open-clip-torch==2.20.0", "pytorch-lightning==1.9.4", "torch==2.1.2+cu121", "torchdiffeq==0.2.3", "torchmetrics==1.3.1", "torchsde==0.2.6", "torchvision==0.16.2+cu121" ], "conda_packages": null, "hip_compiled_version": "N/A", "hip_runtime_version": "N/A", "miopen_runtime_version": "N/A", "caching_allocator_config": "", "is_xnnpack_available": "True", "cpu_info": [ "Architecture=9", "CurrentClockSpeed=3600", "DeviceID=CPU0", "Family=107", "L2CacheSize=3072", "L2CacheSpeed=", "Manufacturer=AuthenticAMD", "MaxClockSpeed=3600", "Name=AMD Ryzen 5 3600 6-Core Processor ", "ProcessorType=3", "Revision=28928" ] }, "Exceptions": [], "CPU": { "model": "AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD", "count logical": 12, "count physical": 6 }, "RAM": { "total": "16GB", "used": "7GB", "free": "9GB" }, "Extensions": [], "Inactive extensions": [], "Environment": { "GRADIO_ANALYTICS_ENABLED": "False" }, "Config": { "ldsr_steps": 100, "ldsr_cached": false, "SCUNET_tile": 256, "SCUNET_tile_overlap": 8, "SWIN_tile": 192, "SWIN_tile_overlap": 8, "SWIN_torch_compile": false, "hypertile_enable_unet": false, "hypertile_enable_unet_secondpass": false, "hypertile_max_depth_unet": 3, "hypertile_max_tile_unet": 256, "hypertile_swap_size_unet": 3, "hypertile_enable_vae": false, "hypertile_max_depth_vae": 3, "hypertile_max_tile_vae": 128, "hypertile_swap_size_vae": 3, "sd_model_checkpoint": "v1-5-pruned-emaonly.safetensors", "sd_checkpoint_hash": "6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa" }, "Startup": { "total": 453.1611487865448, "records": { "initial startup": 0.0, "prepare environment/checks": 0.015625715255737305, "prepare environment/git version info": 0.06250524520874023, "prepare environment/install torch": 181.77436566352844, "prepare environment/torch GPU test": 4.636427640914917, "prepare environment/install clip": 6.1235222816467285, "prepare environment/install open_clip": 6.677409410476685, "prepare environment/clone repositores": 17.74852228164673, "prepare environment/install requirements": 114.88381886482239, "prepare environment/run extensions installers": 0.0, "prepare environment": 331.92219710350037, "launcher": 0.0070362091064453125, "import torch": 5.56201171875, "import gradio": 1.8818750381469727, "setup paths": 2.4444172382354736, "import ldm": 0.05555319786071777, "import sgm": 0.0, "initialize shared": 0.34337615966796875, "other imports": 1.704927682876587, "opts onchange": 0.0, "setup SD model": 0.006754159927368164, "setup codeformer": 0.000232696533203125, "setup gfpgan": 0.013965129852294922, "set samplers": 0.0, "list extensions": 0.0, "restore config state file": 0.0, "list SD models": 107.50378751754761, "list localizations": 0.0, "load scripts/custom_code.py": 0.0, "load scripts/img2imgalt.py": 0.0, "load scripts/loopback.py": 0.0, "load scripts/outpainting_mk_2.py": 0.01562643051147461, "load scripts/poor_mans_outpainting.py": 0.0, "load scripts/postprocessing_caption.py": 0.0, "load scripts/postprocessing_codeformer.py": 0.0, "load scripts/postprocessing_create_flipped_copies.py": 0.0, "load scripts/postprocessing_focal_crop.py": 0.0, "load scripts/postprocessing_gfpgan.py": 0.0, "load scripts/postprocessing_split_oversized.py": 0.015625953674316406, "load scripts/postprocessing_upscale.py": 0.0, "load scripts/processing_autosized_crop.py": 0.0, "load scripts/prompt_matrix.py": 0.0, "load scripts/prompts_from_file.py": 0.0, "load scripts/sd_upscale.py": 0.0, "load scripts/xyz_grid.py": 0.015625715255737305, "load scripts/ldsr_model.py": 0.6751272678375244, "load scripts/lora_script.py": 0.14063310623168945, "load scripts/scunet_model.py": 0.015627145767211914, "load scripts/swinir_model.py": 0.015626192092895508, "load scripts/hotkey_config.py": 0.0, "load scripts/extra_options_section.py": 0.0, "load scripts/hypertile_script.py": 0.04687762260437012, "load scripts/hypertile_xyz.py": 0.0, "load scripts/soft_inpainting.py": 0.015625476837158203, "load scripts/comments.py": 0.015625715255737305, "load scripts/refiner.py": 0.0, "load scripts/seed.py": 0.01562666893005371, "load scripts": 0.987647294998169, "load upscalers": 0.0, "refresh VAE": 0.0, "refresh textual inversion templates": 0.0, "scripts list_optimizers": 0.0, "scripts list_unets": 0.0, "reload hypernetworks": 0.0, "initialize extra networks": 0.0156252384185791, "scripts before_ui_callback": 0.0, "create ui": 0.5208632946014404, "gradio launch": 0.19087910652160645, "add APIs": 0.0, "app_started_callback/lora_script.py": 0.0, "app_started_callback": 0.0 } }, "Packages": [ "accelerate==0.21.0", "aenum==3.1.15", "aiofiles==23.2.1", "aiohttp==3.9.3", "aiosignal==1.3.1", "altair==5.2.0", "antlr4-python3-runtime==4.9.3", "anyio==3.7.1", "async-timeout==4.0.3", "attrs==23.2.0", "blendmodes==2022", "certifi==2024.2.2", "charset-normalizer==3.3.2", "clean-fid==0.1.35", "click==8.1.7", "clip==1.0", "colorama==0.4.6", "contourpy==1.2.0", "cycler==0.12.1", "deprecation==2.1.0", "einops==0.4.1", "exceptiongroup==1.2.0", "facexlib==0.3.0", "fastapi==0.94.0", "ffmpy==0.3.2", "filelock==3.13.1", "filterpy==1.4.5", "fonttools==4.49.0", "frozenlist==1.4.1", "fsspec==2024.2.0", "ftfy==6.1.3", "gitdb==4.0.11", "gitpython==3.1.32", "gradio-client==0.5.0", "gradio==3.41.2", "h11==0.12.0", "httpcore==0.15.0", "httpx==0.24.1", "huggingface-hub==0.21.4", "idna==3.6", "imageio==2.34.0", "importlib-resources==6.1.3", "inflection==0.5.1", "jinja2==3.1.3", "jsonmerge==1.8.0", "jsonschema-specifications==2023.12.1", "jsonschema==4.21.1", "kiwisolver==1.4.5", "kornia==0.6.7", "lark==1.1.2", "lazy-loader==0.3", "lightning-utilities==0.10.1", "llvmlite==0.42.0", "markupsafe==2.1.5", "matplotlib==3.8.3", "mpmath==1.3.0", "multidict==6.0.5", "networkx==3.2.1", "numba==0.59.0", "numpy==1.26.2", "omegaconf==2.2.3", "open-clip-torch==2.20.0", "opencv-python==4.9.0.80", "orjson==3.9.15", "packaging==24.0", "pandas==2.2.1", "piexif==1.1.3", "pillow==9.5.0", "pip==22.2.1", "protobuf==3.20.0", "psutil==5.9.5", "pydantic==1.10.14", "pydub==0.25.1", "pyparsing==3.1.2", "python-dateutil==2.9.0.post0", "python-multipart==0.0.9", "pytorch-lightning==1.9.4", "pytz==2024.1", "pywavelets==1.5.0", "pyyaml==6.0.1", "referencing==0.33.0", "regex==2023.12.25", "requests==2.31.0", "resize-right==0.0.2", "rpds-py==0.18.0", "safetensors==0.4.2", "scikit-image==0.21.0", "scipy==1.12.0", "semantic-version==2.10.0", "sentencepiece==0.2.0", "setuptools==63.2.0", "six==1.16.0", "smmap==5.0.1", "sniffio==1.3.1", "spandrel==0.1.6", "starlette==0.26.1", "sympy==1.12", "tifffile==2024.2.12", "timm==0.9.16", "tokenizers==0.13.3", "tomesd==0.1.3", "toolz==0.12.1", "torch==2.1.2+cu121", "torchdiffeq==0.2.3", "torchmetrics==1.3.1", "torchsde==0.2.6", "torchvision==0.16.2+cu121", "tqdm==4.66.2", "trampoline==0.1.2", "transformers==4.30.2", "typing-extensions==4.10.0", "tzdata==2024.1", "urllib3==2.2.1", "uvicorn==0.28.0", "wcwidth==0.2.13", "websockets==11.0.3", "yarl==1.9.4" ] }

Console logs

Unable to retrieve logs due to the shutdown problem. If there is a way or a command to retrieve the execution logs in A1111, please teach me how, and I'll provide it.

Additional information

I have updated to NVIDIA driver version 551.61

nathan-skynet commented 6 months ago

maybe power source...

codespearhead commented 6 months ago

Potentially related to #15219

AlcantaraMC commented 6 months ago

I am reading about it being a potential power supply issue. I have an 850W PSU that can very well supply the GPU. I know this since I have used the same system flawlessly in the past. I am now looking towards the possibility that maybe the PSU is defective, but I am not entirely convinced yet. I can play games with the same amount of demand as SD inference. I am also reading about GPU power transients which is another possibility.

nathan-skynet commented 6 months ago

on my system has 12 GPUs RT 7900 XTX with three power supplies of 3200 Watts each because of a system crash I lost 4 7900XTX GPUs or around €4000, the problem.... The power supply which could not keep up with the demand. Basically the system I created was intended for cryptocurrencies. try on windows and then Debian if the problem continues I think you can be absolutely sure that it comes from the power supply

nathan-skynet commented 6 months ago

Also try to reduce the resolution of the image like 16x16 and gradually increase with the same prompt by multiples of 2 or 4 (16x16, 32x32, 64x64, 128x128, 256x256, 512x512)

AlcantaraMC commented 6 months ago

on my system has 12 GPUs RT 7900 XTX with three power supplies of 3200 Watts each because of a system crash I lost 4 7900XTX GPUs or around €4000, the problem.... The power supply which could not keep up with the demand. Basically the system I created was intended for cryptocurrencies. try on windows and then Debian if the problem continues I think you can be absolutely sure that it comes from the power supply

Thanks boss.

I am considering either undervolting the GPU or attempt to replace my PSU.

CodeHatchling commented 6 months ago

on my system has 12 GPUs RT 7900 XTX with three power supplies of 3200 Watts each because of a system crash I lost 4 7900XTX GPUs or around €4000, the problem.... The power supply which could not keep up with the demand. Basically the system I created was intended for cryptocurrencies. try on windows and then Debian if the problem continues I think you can be absolutely sure that it comes from the power supply

Thanks boss.

I am considering either undervolting the GPU or attempt to replace my PSU.

Here is a weird fix that seemed to help me with power issues: Plug your PC directly into a wall, instead of a power bar.

Haven't experimented enough to confirm for certain, but certain error codes (they'll appear as CUDA errors, but you have to look at it in Event Viewer to get a proper diagnosis) indicated insufficient power supply. After looking them up online, another user said that plugging into a wall directly prevented these problems.

When I underclocked and undervolted my GPU, it seemed to prevent them too. (I couldn't NOT use a power bar in this case.)

On my new PC with a 4090, I couldn't get the thing to boot without directly plugging into a wall.

See if that helps.