Open AlcantaraMC opened 8 months ago
maybe power source...
Potentially related to #15219
I am reading about it being a potential power supply issue. I have an 850W PSU that can very well supply the GPU. I know this since I have used the same system flawlessly in the past. I am now looking towards the possibility that maybe the PSU is defective, but I am not entirely convinced yet. I can play games with the same amount of demand as SD inference. I am also reading about GPU power transients which is another possibility.
on my system has 12 GPUs RT 7900 XTX with three power supplies of 3200 Watts each because of a system crash I lost 4 7900XTX GPUs or around €4000, the problem.... The power supply which could not keep up with the demand. Basically the system I created was intended for cryptocurrencies. try on windows and then Debian if the problem continues I think you can be absolutely sure that it comes from the power supply
Also try to reduce the resolution of the image like 16x16 and gradually increase with the same prompt by multiples of 2 or 4 (16x16, 32x32, 64x64, 128x128, 256x256, 512x512)
on my system has 12 GPUs RT 7900 XTX with three power supplies of 3200 Watts each because of a system crash I lost 4 7900XTX GPUs or around €4000, the problem.... The power supply which could not keep up with the demand. Basically the system I created was intended for cryptocurrencies. try on windows and then Debian if the problem continues I think you can be absolutely sure that it comes from the power supply
Thanks boss.
I am considering either undervolting the GPU or attempt to replace my PSU.
on my system has 12 GPUs RT 7900 XTX with three power supplies of 3200 Watts each because of a system crash I lost 4 7900XTX GPUs or around €4000, the problem.... The power supply which could not keep up with the demand. Basically the system I created was intended for cryptocurrencies. try on windows and then Debian if the problem continues I think you can be absolutely sure that it comes from the power supply
Thanks boss.
I am considering either undervolting the GPU or attempt to replace my PSU.
Here is a weird fix that seemed to help me with power issues: Plug your PC directly into a wall, instead of a power bar.
Haven't experimented enough to confirm for certain, but certain error codes (they'll appear as CUDA errors, but you have to look at it in Event Viewer to get a proper diagnosis) indicated insufficient power supply. After looking them up online, another user said that plugging into a wall directly prevented these problems.
When I underclocked and undervolted my GPU, it seemed to prevent them too. (I couldn't NOT use a power bar in this case.)
On my new PC with a 4090, I couldn't get the thing to boot without directly plugging into a wall.
See if that helps.
Checklist
What happened?
Hi. Recently I have experienced my PC just shutting down when using A1111. When checking the Event Viewer, I see one Critical Level Entry with the message "Error Code 41: "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."
Here are my PC Specs:
OS: Windows 11 and Debian 12 (dual-boot) CPU: AMD Ryzen™ 5 3600 GPU: NVIDIA GeForce RTX™ ASUS Megalodon 3070 RAM: 16GB 3600MHz Corsair DDR5 Mobo: Gigabyte Technology Co., Ltd. B550 AORUS ELITE AX V2 PSU: NZXT 850w Gold Rated
My temp monitors do not catch any above-normal heat readings prior to the shutdown. Here is my last reading: https://drive.google.com/file/d/1T0yc71N2m6bfGfCo8LonEod8Ishx5ItF/view?usp=sharing
=============================================================================
There is one interesting Information Level Event right before the Critical one, which says:
ACPI thermal zone _TZ.UAD0 has been enumerated.
_PSV = 290K
_TC1 = 0
_TC2 = 0
_TSP = 1000ms
_AC0 = 0K
_AC1 = 0K
_AC2 = 0K
_AC3 = 0K
_AC4 = 0K
_AC5 = 0K
_AC6 = 0K
_AC7 = 0K
_AC8 = 0K
_AC9 = 0K
_CRT = 294K
_HOT = 293K
minimum throttle = 0
_CR3 = 0K
=====================================================================
What is the probable problem? Thanks!
Steps to reproduce the problem
What should have happened?
Generate images.
What browsers do you use to access the UI ?
Google Chrome
Sysinfo
{ "Platform": "Windows-10-10.0.22631-SP0", "Python": "3.10.6", "Version": "1.8.0-RC", "Commit": "",
"Script path": "C:\Users\\Downloads\stable-diffusion-webui-master\stable-diffusion-webui-master",
"Data path": "C:\Users\\Downloads\stable-diffusion-webui-master\stable-diffusion-webui-master",
"Extensions dir": "C:\Users\*****\Downloads\stable-diffusion-webui-master\stable-diffusion-webui-master\extensions",
"Checksum": "6c73b970beebe1f3d2fe7b8801b96f916af660b2cdd632bac37e461b443e180f",
"Commandline": [
"launch.py"
],
"Torch env info": {
"torch_version": "2.1.2+cu121",
"is_debug_build": "False",
"cuda_compiled_version": "12.1",
"gcc_version": null,
"clang_version": null,
"cmake_version": null,
"os": "Microsoft Windows 11 Pro",
"libc_version": "N/A",
"python_version": "3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)",
"python_platform": "Windows-10-10.0.22631-SP0",
"is_cuda_available": "True",
"cuda_runtime_version": null,
"cuda_module_loading": "LAZY",
"nvidia_driver_version": "551.61",
"nvidia_gpu_models": "GPU 0: NVIDIA GeForce RTX 3070",
"cudnn_version": null,
"pip_version": "pip3",
"pip_packages": [
"numpy==1.26.2",
"open-clip-torch==2.20.0",
"pytorch-lightning==1.9.4",
"torch==2.1.2+cu121",
"torchdiffeq==0.2.3",
"torchmetrics==1.3.1",
"torchsde==0.2.6",
"torchvision==0.16.2+cu121"
],
"conda_packages": null,
"hip_compiled_version": "N/A",
"hip_runtime_version": "N/A",
"miopen_runtime_version": "N/A",
"caching_allocator_config": "",
"is_xnnpack_available": "True",
"cpu_info": [
"Architecture=9",
"CurrentClockSpeed=3600",
"DeviceID=CPU0",
"Family=107",
"L2CacheSize=3072",
"L2CacheSpeed=",
"Manufacturer=AuthenticAMD",
"MaxClockSpeed=3600",
"Name=AMD Ryzen 5 3600 6-Core Processor ",
"ProcessorType=3",
"Revision=28928"
]
},
"Exceptions": [],
"CPU": {
"model": "AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD",
"count logical": 12,
"count physical": 6
},
"RAM": {
"total": "16GB",
"used": "7GB",
"free": "9GB"
},
"Extensions": [],
"Inactive extensions": [],
"Environment": {
"GRADIO_ANALYTICS_ENABLED": "False"
},
"Config": {
"ldsr_steps": 100,
"ldsr_cached": false,
"SCUNET_tile": 256,
"SCUNET_tile_overlap": 8,
"SWIN_tile": 192,
"SWIN_tile_overlap": 8,
"SWIN_torch_compile": false,
"hypertile_enable_unet": false,
"hypertile_enable_unet_secondpass": false,
"hypertile_max_depth_unet": 3,
"hypertile_max_tile_unet": 256,
"hypertile_swap_size_unet": 3,
"hypertile_enable_vae": false,
"hypertile_max_depth_vae": 3,
"hypertile_max_tile_vae": 128,
"hypertile_swap_size_vae": 3,
"sd_model_checkpoint": "v1-5-pruned-emaonly.safetensors",
"sd_checkpoint_hash": "6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa"
},
"Startup": {
"total": 453.1611487865448,
"records": {
"initial startup": 0.0,
"prepare environment/checks": 0.015625715255737305,
"prepare environment/git version info": 0.06250524520874023,
"prepare environment/install torch": 181.77436566352844,
"prepare environment/torch GPU test": 4.636427640914917,
"prepare environment/install clip": 6.1235222816467285,
"prepare environment/install open_clip": 6.677409410476685,
"prepare environment/clone repositores": 17.74852228164673,
"prepare environment/install requirements": 114.88381886482239,
"prepare environment/run extensions installers": 0.0,
"prepare environment": 331.92219710350037,
"launcher": 0.0070362091064453125,
"import torch": 5.56201171875,
"import gradio": 1.8818750381469727,
"setup paths": 2.4444172382354736,
"import ldm": 0.05555319786071777,
"import sgm": 0.0,
"initialize shared": 0.34337615966796875,
"other imports": 1.704927682876587,
"opts onchange": 0.0,
"setup SD model": 0.006754159927368164,
"setup codeformer": 0.000232696533203125,
"setup gfpgan": 0.013965129852294922,
"set samplers": 0.0,
"list extensions": 0.0,
"restore config state file": 0.0,
"list SD models": 107.50378751754761,
"list localizations": 0.0,
"load scripts/custom_code.py": 0.0,
"load scripts/img2imgalt.py": 0.0,
"load scripts/loopback.py": 0.0,
"load scripts/outpainting_mk_2.py": 0.01562643051147461,
"load scripts/poor_mans_outpainting.py": 0.0,
"load scripts/postprocessing_caption.py": 0.0,
"load scripts/postprocessing_codeformer.py": 0.0,
"load scripts/postprocessing_create_flipped_copies.py": 0.0,
"load scripts/postprocessing_focal_crop.py": 0.0,
"load scripts/postprocessing_gfpgan.py": 0.0,
"load scripts/postprocessing_split_oversized.py": 0.015625953674316406,
"load scripts/postprocessing_upscale.py": 0.0,
"load scripts/processing_autosized_crop.py": 0.0,
"load scripts/prompt_matrix.py": 0.0,
"load scripts/prompts_from_file.py": 0.0,
"load scripts/sd_upscale.py": 0.0,
"load scripts/xyz_grid.py": 0.015625715255737305,
"load scripts/ldsr_model.py": 0.6751272678375244,
"load scripts/lora_script.py": 0.14063310623168945,
"load scripts/scunet_model.py": 0.015627145767211914,
"load scripts/swinir_model.py": 0.015626192092895508,
"load scripts/hotkey_config.py": 0.0,
"load scripts/extra_options_section.py": 0.0,
"load scripts/hypertile_script.py": 0.04687762260437012,
"load scripts/hypertile_xyz.py": 0.0,
"load scripts/soft_inpainting.py": 0.015625476837158203,
"load scripts/comments.py": 0.015625715255737305,
"load scripts/refiner.py": 0.0,
"load scripts/seed.py": 0.01562666893005371,
"load scripts": 0.987647294998169,
"load upscalers": 0.0,
"refresh VAE": 0.0,
"refresh textual inversion templates": 0.0,
"scripts list_optimizers": 0.0,
"scripts list_unets": 0.0,
"reload hypernetworks": 0.0,
"initialize extra networks": 0.0156252384185791,
"scripts before_ui_callback": 0.0,
"create ui": 0.5208632946014404,
"gradio launch": 0.19087910652160645,
"add APIs": 0.0,
"app_started_callback/lora_script.py": 0.0,
"app_started_callback": 0.0
}
},
"Packages": [
"accelerate==0.21.0",
"aenum==3.1.15",
"aiofiles==23.2.1",
"aiohttp==3.9.3",
"aiosignal==1.3.1",
"altair==5.2.0",
"antlr4-python3-runtime==4.9.3",
"anyio==3.7.1",
"async-timeout==4.0.3",
"attrs==23.2.0",
"blendmodes==2022",
"certifi==2024.2.2",
"charset-normalizer==3.3.2",
"clean-fid==0.1.35",
"click==8.1.7",
"clip==1.0",
"colorama==0.4.6",
"contourpy==1.2.0",
"cycler==0.12.1",
"deprecation==2.1.0",
"einops==0.4.1",
"exceptiongroup==1.2.0",
"facexlib==0.3.0",
"fastapi==0.94.0",
"ffmpy==0.3.2",
"filelock==3.13.1",
"filterpy==1.4.5",
"fonttools==4.49.0",
"frozenlist==1.4.1",
"fsspec==2024.2.0",
"ftfy==6.1.3",
"gitdb==4.0.11",
"gitpython==3.1.32",
"gradio-client==0.5.0",
"gradio==3.41.2",
"h11==0.12.0",
"httpcore==0.15.0",
"httpx==0.24.1",
"huggingface-hub==0.21.4",
"idna==3.6",
"imageio==2.34.0",
"importlib-resources==6.1.3",
"inflection==0.5.1",
"jinja2==3.1.3",
"jsonmerge==1.8.0",
"jsonschema-specifications==2023.12.1",
"jsonschema==4.21.1",
"kiwisolver==1.4.5",
"kornia==0.6.7",
"lark==1.1.2",
"lazy-loader==0.3",
"lightning-utilities==0.10.1",
"llvmlite==0.42.0",
"markupsafe==2.1.5",
"matplotlib==3.8.3",
"mpmath==1.3.0",
"multidict==6.0.5",
"networkx==3.2.1",
"numba==0.59.0",
"numpy==1.26.2",
"omegaconf==2.2.3",
"open-clip-torch==2.20.0",
"opencv-python==4.9.0.80",
"orjson==3.9.15",
"packaging==24.0",
"pandas==2.2.1",
"piexif==1.1.3",
"pillow==9.5.0",
"pip==22.2.1",
"protobuf==3.20.0",
"psutil==5.9.5",
"pydantic==1.10.14",
"pydub==0.25.1",
"pyparsing==3.1.2",
"python-dateutil==2.9.0.post0",
"python-multipart==0.0.9",
"pytorch-lightning==1.9.4",
"pytz==2024.1",
"pywavelets==1.5.0",
"pyyaml==6.0.1",
"referencing==0.33.0",
"regex==2023.12.25",
"requests==2.31.0",
"resize-right==0.0.2",
"rpds-py==0.18.0",
"safetensors==0.4.2",
"scikit-image==0.21.0",
"scipy==1.12.0",
"semantic-version==2.10.0",
"sentencepiece==0.2.0",
"setuptools==63.2.0",
"six==1.16.0",
"smmap==5.0.1",
"sniffio==1.3.1",
"spandrel==0.1.6",
"starlette==0.26.1",
"sympy==1.12",
"tifffile==2024.2.12",
"timm==0.9.16",
"tokenizers==0.13.3",
"tomesd==0.1.3",
"toolz==0.12.1",
"torch==2.1.2+cu121",
"torchdiffeq==0.2.3",
"torchmetrics==1.3.1",
"torchsde==0.2.6",
"torchvision==0.16.2+cu121",
"tqdm==4.66.2",
"trampoline==0.1.2",
"transformers==4.30.2",
"typing-extensions==4.10.0",
"tzdata==2024.1",
"urllib3==2.2.1",
"uvicorn==0.28.0",
"wcwidth==0.2.13",
"websockets==11.0.3",
"yarl==1.9.4"
]
}
Console logs
Additional information
I have updated to NVIDIA driver version 551.61