BlackWyvern commented 1 month ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

Whilst trying to generate any image via any model in SDXL I am met with with either a "Press any key to continue..." error. Or a hard memory access fault for python.exe.

It'll also take out several other background applications when it crashes like this.

A post in this thread suggested checking event viewer on these crashes.

Faulting application name: python.exe, version: 3.10.11150.1013, time stamp: 0x642cc427 Faulting module name: c10.dll, version: 0.0.0.0, time stamp: 0x6578c6fe Exception code: 0xc0000005 Fault offset: 0x0000000000055474 Faulting process id: 0x39bc Faulting application start time: 0x01dad2c9928bde49 Faulting application path: I:\Python\Python3-10-6\python.exe Faulting module path: I:\Stable Diffusion\venv\lib\site-packages\torch\lib\c10.dll Report Id: fac5843a-d536-4688-b8ca-2ce2e46d2d27 Faulting package full name: Faulting package-relative application ID:

Steps to reproduce the problem

Load any SDXL model. Hit generate. Doesn't even need a prompt.

What should have happened?

Make images.

Shouldn't nuke Discord, Steam, and DWM.exe all at once.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo-2024-07-10-13-24.json

Console logs

venv "I:\Stable Diffusion\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.9.4
Commit hash: feee37d75f1b168768014e4634dcb156ee649c05
Launching Web UI with arguments: --xformers --medvram --medvram-sdxl --autolaunch
*** "Disable all extensions" option was set, will not load any extensions ***
Loading weights [821aa5537f] from I:\Stable Diffusion\models\Stable-diffusion\SDXL\autismmixSDXL_autismmixPony.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: I:\Stable Diffusion\repositories\generative-models\configs\inference\sd_xl_base.yaml
I:\Stable Diffusion\venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Startup time: 14.8s (prepare environment: 3.0s, import torch: 4.6s, import gradio: 1.1s, setup paths: 1.3s, initialize shared: 2.2s, other imports: 0.8s, load scripts: 0.9s, create ui: 0.4s, gradio launch: 0.6s).
Loading VAE weights specified in settings: I:\Stable Diffusion\models\VAE\sdxl_vae.safetensors
Applying attention optimization: xformers... done.
Model loaded in 10.3s (load weights from disk: 0.7s, create model: 0.7s, apply weights to model: 6.8s, load VAE: 0.1s, calculate empty prompt: 1.8s).
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:08<00:00,  4.77it/s]
Press any key to continue . . . ███████████████████████████████████████████████████████████████████| 40/40 [00:06<00:00,  5.69it/s]

Additional information

No response

Allwhey commented 1 month ago

Hello, I made an account to speak about this issue. It is very frustrating, but I've at least found out a bit about it. It isn't related to xformers, as disabling and enabling it makes no difference. It happens on CUDA 11.8, CUDA 12.1 and CUDA 12.1.1, and Pytorch 2.1.2 and 2.3.1.

It's a 0xc0000005 error, which is related to seg faults in Windows and is caused by lib\site-packages\torch\lib\c10.dll.

Rolling back to torch 2.0.1+cu118 currently solves the issue, at least for me. This should imply that some change has been made to torch since then, perhaps in the c10 library that causes a segfault. Unfortunately I don't have the familiarity with pytorch or ai programming to suggest what change this actually is.

If someone has any insight what causes this sort of crash in the generation process and create a tangible issue on the Pytorch Github, I would be grateful if you did so.

Allwhey commented 1 month ago

In case anyone doesn't know the actual commands to do so, I just ran this in the root dir of my webui install.

call venv/Scripts/activate.bat
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
# (if you want to rollback xformers)
pip install --pre -U xformers torch==2.0.1

BlackWyvern commented 1 month ago

Followed the above commands. Not sure how to roll back cuda versions though.

Faulting application name: bad_module_info, version: 0.0.0.0, time stamp: 0x00000000 Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000 Exception code: 0xc0000005 Fault offset: 0x00007ffd09145474 Faulting process id: 0x3c34 Faulting application start time: 0x01dad37ae8a95888 Faulting application path: bad_module_info Faulting module path: unknown Report Id: a2da468a-7006-4c8c-9a92-6d02d8513f29 Faulting package full name: Faulting package-relative application ID:

Application: mmc.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: exception code e0434352, exception address 00007FFD6E25BA99 Stack:

Faulting application name: dwm.exe, version: 10.0.19041.4355, time stamp: 0x6564cf4e Faulting module name: KERNELBASE.dll, version: 10.0.19041.4522, time stamp: 0xf7a99bd4 Exception code: 0xc00001ad Fault offset: 0x000000000012d332 Faulting process id: 0x1090 Faulting application start time: 0x01dad2cabb96cec4 Faulting application path: C:\Windows\system32\dwm.exe Faulting module path: C:\Windows\System32\KERNELBASE.dll Report Id: 36bd6217-4ba4-411f-8e2e-fce036a69f15 Faulting package full name: Faulting package-relative application ID:

Allwhey commented 1 month ago

I haven't seen that one before. You don't necessarily need to roll back cuda, you just need to make sure Cuda 11.8 is installed. *In other words, at least for me, it works now despite the fact Cuda 11.8 and 12.1.1 are installed simultaneously. CUDA 11.8 Download Archive Restart your pc after to be sure, but as long as CUDA 11.8 is on your system it should work.

--index-url https://download.pytorch.org/whl/cu118 ensures that pip installs the version of pytorch 2.0.1 that was built with cuda 11.8.

If it didn't work try using --force-reinstall with the first pip command.

AlexKaleda commented 1 month ago

Thank you for this solution, it works well! Interestingly, it began when I switched my graphics card from 1070 to 3060. I could generate 1-2 times and then crash in the end of the next run. At the same time I used ComfyUI with '2.3.0+cu121' and had no problems.

Allwhey commented 1 month ago

Thank you for this solution, it works well! Interestingly, it began when I switched my graphics card from 1070 to 3060. I could generate 1-2 times and then crash in the end of the next run. At the same time I used ComfyUI with '2.3.0+cu121' and had no problems.

Thank you for mentioning that 2.3.0 still works on ComfyUI! In that case, it's still possible to be related to sd-webui's image generation implementation, perhaps specifically for SDXL. I am curious why changing the torch version would cause a segfault, because every error should have a sanitized exception. Nonetheless there might be a workaround on sd-webui's end that can be done to mitigate this.

BlackWyvern commented 1 month ago

Installed the cuda kit as instructed. Still getting faults, but at least it's showing c10.dll again? Also managed to get it to hard fault, once necessitating a full restart.

3asKl6mPhH

The error still and always occurs at 100% completion before the image is decoded/saved/displayed or whatever happens there. If that helps narrow anything down.

Faulting application name: python.exe, version: 3.10.11150.1013, time stamp: 0x642cc427 Faulting module name: c10.dll, version: 0.0.0.0, time stamp: 0x6578c6fe Exception code: 0xc0000005 Fault offset: 0x0000000000055474 Faulting process id: 0x3724 Faulting application start time: 0x01dad3e9bcdcb909 Faulting application path: I:\Python\Python3-10-6\python.exe Faulting module path: I:\Stable Diffusion\venv\lib\site-packages\torch\lib\c10.dll Report Id: f392d87e-8bd7-4242-a5b8-d73f082cfa9c Faulting package full name: Faulting package-relative application ID:

Allwhey commented 1 month ago

15175

I found an issue that seems to be heavily or directly related to this. It seems it's definitely some combination of an issue with how sd-webui is handling SDXL and SDXL loras, along with more recent version of pytorch. But there should definitely be a problem with the webui code itself, as SDXL can barely work on my computer with 32GB of ram running at 6200mhz and 32gb of page files. It might crash in part due to this memory mismanagement.

Allwhey commented 1 month ago

Installed the cuda kit as instructed. Still getting faults, but at least it's showing c10.dll again? Also managed to get it to hard fault, once necessitating a full restart.

The error still and always occurs at 100% completion before the image is decoded/saved/displayed or whatever happens there. If that helps narrow anything down.

Faulting application name: python.exe, version: 3.10.11150.1013, time stamp: 0x642cc427 Faulting module name: c10.dll, version: 0.0.0.0, time stamp: 0x6578c6fe Exception code: 0xc0000005 Fault offset: 0x0000000000055474 Faulting process id: 0x3724 Faulting application start time: 0x01dad3e9bcdcb909 Faulting application path: I:\Python\Python3-10-6\python.exe Faulting module path: I:\Stable Diffusion\venv\lib\site-packages\torch\lib\c10.dll Report Id: f392d87e-8bd7-4242-a5b8-d73f082cfa9c Faulting package full name: Faulting package-relative application ID:

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion

You can also try disabling system memory fallback for later nvidia drivers which seems to have some extreme disagreements with pytorch and stable diffusion. Not sure if RAM can handle anything without it though.

BlackWyvern commented 1 month ago

Did a clean driver reinstall. And disabled memory fallback. I managed to get it to do one generation properly.

Then it segfaulted on c10.dll again.

fennecbutt commented 1 month ago

I was getting this error with all variants of this model (Ratatoskr) https://civitai.com/models/192854/ratatoskr-animal-creature-and-furry.

Same exception code/location and faulting module. After mucking about with other fixes people suggested, it has been completely solved when I realised that I had a swap of 0MB set (because I'm using ssds and have 32gb of ram). I set a min swap of 8192mb and max of 32768mb and it's now working just fine. Haven't tried with lower values yet.

K-Max-Me commented 2 weeks ago

This could be infact an Nvidia issue, but oddly enough this doesn't happen in ComfyUI. I did notice it happening when VRAM memory is near or maxed out (Even on a 3090)

I'd thought I post this up in case someone has time to try it on a clean install. I did downgrade pyTorch to 2.0.1 and use cu118 as @Allwhey mentioned, but still crashed, didn't work for me.

I seem to get a different fault from the start from everyone else, though. Check your windows event log in case you get that random "press key to continue..." error with no error.

Exception 0xc0000005 is an Access Violation Exception.

Faulting application name: python.exe, version: 3.10.10150.1013, time stamp: 0x63e2893e Faulting module name: nvcuda64.dll, version: 32.0.15.5599, time stamp: 0x665baccb Exception code: 0xc0000005 Fault offset: 0x00000000003e862f Faulting process id: 0x0xEBF0 Faulting application start time: 0x0x1DAE3D33D3B2742 Faulting application path: C:\Users\kmax\AppData\Local\Programs\Python\Python310\python.exe Faulting module path: C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispsig.inf_amd64_e6cac7f31a92d62e\nvcuda64.dll Report Id: 934672f3-76b6-4ef7-abc9-6803a77fd56e Faulting package full name: Faulting package-relative application ID:

cbodenberger commented 6 days ago

I have been having this issue with multiple gpu rig 3090, 2080ti, 3060, p102-100. Windows 11. 32GB of Ram. I've got 5 GPUs + 2 CPUs (x99 motherboard, dual Xeon e5-2630 v4s 20 core 40 thread 2.2GHz). I can get a pretty consistent crash. I'm using a discord bot and two instances on my 2080tis in --api --nowebui. When I queue up multiple generations it crashes. When I queue one image at a time and give it some time between queues it doesn't crash. Python, DWM, and Nvidia driver all show errors around the time of crash. I've played with --lowram, --medvram. I never seem to run out of vram or ram. Using about half of vram. Maybe 29 of 32 GB of ram utilization when I queue a ton up.

I run a third instance on my 3090 in webui and it crashes, just not as often.

I don't know if my rambling helps narrow it down at all. I can provide logs and junk and a more detailed systeminfo if needed.

edit: probably should have mentioned it happens when I hit 100% on the current generation and says "Press any key to continue" unless it crashes the DWM, in which case I can't see.

AUTOMATIC1111 / stable-diffusion-webui

[Bug]: Press any key../Hard Fault - Unable to generate SDXL #16186

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

15175