AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.15k stars 26.41k forks source link

[Resolved] NVIDIA driver performance issues #11063

Closed w-e-w closed 10 months ago

w-e-w commented 1 year ago

Update (2023-10-31)

This issue should now be entirely resolved. NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver (546.01 or newer) and follow the guide on their website: https://nvidia.custhelp.com/app/answers/detail/a_id/5490

Update (2023-10-19)

issue has been reopened as it seems like more and more reports are saying that the issue is not yet fixed

Update (2023-10-17)

there seems to be some reports saying that the issue is still not fixed

comments

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1764127635 https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1764899003

Update (2023-10-14)

This issue has reportedly been fixed by NVIDIA as of 537.58 (537.42 if using Studio release). Please update your drivers to this version or later.

The original issue description follows.


Discussed in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/11062

Originally posted by **w-e-w** June 7, 2023 some users have reported some issues related to the latest Nvidia drivers [nVidia drivers change in memory management vladmandic#1285](https://github.com/vladmandic/automatic/discussions/1285) https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11050#issuecomment-1578731478 if you have been experiencing generation slowdowns or getting stuck, **consider downgrading to driver version 531 or below** [NVIDIA Driver Downloads](https://www.nvidia.com/download/Find.aspx)

This issue will be closed when NVIDIA resolves this issue. It currently has a tracking number of [4172676]

tusharbhutt commented 1 year ago

Funny, I am randomly getting the issues where an output is stuck at 50% for an hour, and I am on 531.41 for an NVIDIIA 3060 12GB model

chaewai commented 1 year ago

Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.

designborg commented 1 year ago

I can confirm this bug. I was getting results (as expected) before I installed the latest Titan RTX drivers. I will try installing a previous build.

AIDungeonTester2 commented 1 year ago

Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.

Yeah, that's exactly how it is for me. When I tried inpainting, the first gen runs through just fine, but any subsequent ones have massive hang-ups, necessitating a restart of the commandline window and rerunning webui-user.bat.

younyokel commented 1 year ago

I wasn't sure if there was a problem with the drivers so I reinstalled WebUI, but the problem didn't go away. To think, everything generates fine like before, but once the High Res Fix starts and finishes, it looks like a minute pause. Edit: confirming. Downgraded to 531.68. Now everything as it was

hearmeneigh commented 1 year ago

If you are stuck with a newer Nvidia driver version, downgrading to Torch 1.13.1 seems to work too.

  1. Add the following to webui-user.bat: set TORCH_COMMAND=pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
  2. Remove <webui-root>/venv directory
  3. (Re)start WebUI
Shawdooow commented 1 year ago

I am having the opposite issue where on the newer drivers my first image generation is slow because of some clogged memory on my GPU which frees itself as soon as it gets to the second one. image

Downgrading Torch didn't seem to help at all. Downgrading from 536.23 to 531.79 fixes the problem instantly.

younyokel commented 1 year ago

Anyone, is this problem still relevant?

designborg commented 1 year ago

I haven't tried with the latest drivers, so I don't know if this issue is still ongoing.

PsychoGarlic commented 1 year ago

Extremely slow for me. Downgraded the pytorch, and had a whole lotta of new problems. What usually took 4h is taking 10+

invaderxan1 commented 1 year ago

Please tell me there is a fix in the pipeline?

LabunskyA commented 1 year ago

For pro graphics (at least for my A4000) 531 is not going to help with eliminating the issue. Need to downgrade to at least 529 to get rid of the shared memory usage. And both 529 / 531 / 535 / 536 in production brunch are working way worse, than 531 at new feature (uses shared VRAM, but way smaller footprint for some reason)

RobotsHoldingHands commented 1 year ago

Can confirm this is still an issue, I have a RTX 3080 TI and downgrading to 531.68 solved it for me.

Detrian commented 1 year ago

I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.

PsychoGarlic commented 1 year ago

It got a tiny bit better here. torch 1.13.1+cu117. 531.79. Cuda compilation tools release 12.0, V12.0.76

Still having issues with the duration of the generations. Usually, 200 frames took 4h, and now it is taking 10 (720x1280, 30 steps, 2~3 controlNets). Don't know how to fix it properly. Every other fix I did, severely damaged the quality of the images. I now know that I was using the 1.2.1 version of the webUI and the torch was not 2.0. Every other setting I do not remember. Now I have everything written somewhere hahahah

Em dom., 9 de jul. de 2023 às 11:10, Detrian @.***> escreveu:

I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1627668465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXED22T44Y6R7JGQJVB5LDXPJ7RZANCNFSM6AAAAAAY4U6YGE . You are receiving this because you commented.Message ID: @.***>

dajusha commented 1 year ago

536.67 fixed this? or not?

PsychoGarlic commented 1 year ago

I did not try it. A lot of wasted time already hahjaja

Em qua., 19 de jul. de 2023 às 00:12, dajusha @.***> escreveu:

536.67 fixed this? or not?

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1641104904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXED2ZHD4ROQFAXGYNLTKLXQ4J7HANCNFSM6AAAAAAY4U6YGE . You are receiving this because you commented.Message ID: @.***>

WhiteX commented 1 year ago

536.67 fixed it for me.

prescience-data commented 1 year ago

536.67 also worked for me somewhat, meaning it still seems to drop to shared memory but not as aggressively (latest versions seem to start using shared memory at 10GB rather than fully maxing out all available 12GB which matters.

The 536.67 driver release notes still references shared memory, and I recently started getting the "hanging at 50% bug" again today after updating some plugins which prompted me to dig a bit deeper for some solutions.

I often use 2 or 3 ControlNet 1.1 models + Hi-res Fix upscaling on a 12GB card which is what triggers it if I watch my Performance tab and see the GPU begin to use shared CPU memory.

The ideal fix would be finding some way to create a --never-use-migraine-inducing-shared-memory flag, but I assume this would rely on some driver or operating system API to become available after some light research as there doesn't seem to be a way to "block" a specific process from using shared memory.

However, for the good news - I was able to massively reduce this >12GB memory usage without resorting to --medvram with the following steps:

Initial environment baseline

  1. Check your CLI to make sure you don't have any "using old xformers" WARN message (not sure if this is actually related but it was part of the process, so makes sense to include it)
  2. Add set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512 to webui-user.bat
  3. I assume here, 12GB users are already running the flags --xformers and --opt-split-attention.

Biggest improvement

Assuming your environment already looks similar to the above, by far the biggest VRAM drop I found was switching from the 1.4GB unpruned .pth ControlNet 1.1 models to these 750MB pruned .safetensors versions https://civitai.com/models/38784

Hope this helps anyone in a similar frustrating position 😁

catboxanon commented 1 year ago

From my understanding ComfyUI might've done something with CUDA's malloc to fix this. https://github.com/comfyanonymous/ComfyUI/commit/1679abd86d944521cad8a94a09d30fd5e238ae22

Looks like a lot of cards also don't support this though: https://github.com/search?q=repo%3Acomfyanonymous%2FComfyUI+malloc&type=commits&s=author-date&o=desc

catboxanon commented 1 year ago

536.67 also did not fix this, according to the release notes.

https://us.download.nvidia.com/Windows/536.67/536.67-win11-win10-release-notes.pdf

3.2 Open Issues in Version 536.67 WHQL

This driver implements a fix for creative application stability issues seen during heavy memory usage. We’ve observed some situations where this fix has resulted in performance degradation when running Stable Diffusion and DaVinci Resolve. This will be addressed in an upcoming driver release. [4172676]

david-trigo commented 1 year ago

I updated the drivers without thinking this might happen and now I can't go back. I have tried removing the drivers with "Display Driver Uninstaller" and then installing v531.68 and v528.49 , but it still doesn't go as fast as before. RTX 4080 (Laptop) 12GB. I seem to be missing something.

Edit: finally my problem seems to be with the laptop itself. Yesterday I was testing 536.67 and 536.99 on my desktop using RTX 3080 with no problems.

TeKett commented 1 year ago

After downgrading to 531.79 i noticed that it was using very slightly less ram, but was slower. So i downgraded to 531.18 but cant see any difference to 536.67 other then aforementioned less ram usage.

FilipeF12 commented 1 year ago

win10 latest, sd-webui 1.5.1, model: sdxl 1.0, image size 1024x1024

my experience yesterday with nvidia 531.79 gen 4 images under a minute on a 3090

my experience today with nvidia 536.57 1 image :23 sec 2 images 8 minutes 3 images 8 minutes 4 images 8 minutes

going to uninstall 536.57 and install 531.79

catboxanon commented 1 year ago

536.99 just released, with the open issue mentioned prior still there, but the mention of Stable Diffusion seemingly vanished. (It was given a reference number of 4172676 as mentioned here)

https://us.download.nvidia.com/Windows/536.99/536.99-win11-win10-release-notes.pdf

3.2 Open Issues in Version 536.99 WHQL

[DaVinci Resolve] This driver implements a fix for creative application stability issues seen during heavy memory usage. We’ve observed some situations where this fix has resulted in performance degradation when running DaVinci Resolve. This will be addressed in an upcoming driver release. [4172676]

jieran233 commented 1 year ago

Has anyone tried 536.99?

TeKett commented 1 year ago

I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic. 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.

david-trigo commented 1 year ago

Has anyone tried 536.99?

I just installed 536.99 using RTX 3080 and so far it's working fine.

FilipeF12 commented 1 year ago

I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic, 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.

you mention 'rolling back to 531'. nvidia advises against using roll back. they say to uninstall the updated driver, and then download and install the old driver.

FilipeF12 commented 1 year ago

win10 latest, sd-webui 1.5.1, model: sdxl 1.0, image size 1024x1024

my experience yesterday with nvidia 531.79 gen 4 images under a minute on a 3090

my experience today with nvidia 536.57 1 image :23 sec 2 images 8 minutes 3 images 8 minutes 4 images 8 minutes

going to uninstall 536.57 and install 531.79

after uninstalling 536 and installing 531 I am back to the speeds I had before

TeKett commented 1 year ago

I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic. 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.

you mention 'rolling back to 531'. nvidia advises against using roll back. they say to uninstall the updated driver, and then download and install the old driver.

i searched up the versions on nvidia, and downloaded the older versions. I still decided to go with 536.99 fyi

WhiteX commented 1 year ago

I thought it's all fine with 536.99 on my RTX 3050 (mobile GPU) but uninstalling it and installing 531.61 studio edition, it made iterations 20% faster in ComfyUI, which is not a lot, but very noticeable.

matichek commented 1 year ago

Discussed in #11062

Originally posted by w-e-w June 7, 2023 some users have reported some issues related to the latest Nvidia drivers nVidia drivers change in memory management vladmandic#1285 #11050 (comment) if you have been experiencing generation slowdowns or getting stuck, consider downgrading to driver version 531 or below NVIDIA Driver Downloads

ya, been wondering why- the renders are now like 10x slower, will check older version of drivers. Edit1: downgraded to 536.67-desktop-win10-win11-64bit-international-dch-whql - seems to be much better - but I am sure if downgrading more would be better - I have 2070 rtx Edit2: after 5 renders things are back to "very bad" - rendering times for 512 image are back to 5min :(

Edit 3: went way back to "531.68-desktop-win10-win11-64bit-international-dch-whql" - and things seems back to normal - so "super fast" in comparison

Edit 4: now after 10 renders and 2x img2img upscales, I get cuda memory error - no matter what I do. Strange.

AlienRenders commented 1 year ago

I can't go back to a previous driver as I wouldn't be able to train sdxl. At the start, there's a huge spike in vram usage to about 14GB and goes back down to 7GB during training. I have 11GB vram, so the new drivers let get past that initial stage. With the old drivers, I get oom.

When it saves state, it doubles vram usage to 17GB and never releases it. From this point on, it's using shared vram and slows down drastically.

Why does it need to use vram to save? Can't that all be done on CPU? If saving didn't use 10GB extra vram, I could train sdxl with using only 7GB vram.

gerroon commented 1 year ago

Installing 536.99 does not help,

--no-half makes it very slow

--medvram --disable-nan-check does not do anything except giving me an empty image.

I am using 3080 TI, is there anything else I can try :( Something must happened within last couple weeks because this was working fine :(

FilipeF12 commented 1 year ago

Installing 536.99 does not help,

--no-half makes it very slow

--medvram --disable-nan-check does not do anything except giving me an empty image.

I am using 3080 TI, is there anything else I can try :( Something must happened within last couple weeks because this was working fine :(

I went back to 531 and reinstalled sd 1.5.1 from scratch. Faster, but definitely not as fast as before installing 536.

TeKett commented 1 year ago

Installing 536.99 does not help,

--no-half makes it very slow

--medvram --disable-nan-check does not do anything except giving me an empty image.

I am using 3080 TI, is there anything else I can try :( Something must happened within last couple weeks because this was working fine :(

"--disable-nan-check" only disables the warning, If things are working correctly, should only be a problem when training with wrong settings, or images with alpha on a model that dont take alpha.

a1111 args list says "Do not check if produced images/latent spaces have nans; useful for running without a checkpoint in CI." Not sure what "Cl" is.

Romangelo commented 1 year ago

537.13 still hasn't fix anything.

falsewinds commented 1 year ago

downgrade driver to 531.68, use torch 1.13.1 with cu117, but still get CUDA out of memory error. Does anyone has the same issue? v1.6.0 and v1.5.2 both have this issue.

EDIT: RTX 4070 12gb

younyokel commented 1 year ago

537.13 still hasn't fix anything.

Just updated, at first sight I don't notice any significant performance issues.

bluekght commented 11 months ago

Tried 537.42

Absolutely not yet fixed. Takes me 3-4 minutes to generate a single 1920x1280 image in this driver.

Back to 531.68, and it's 20 second generation times.

Romangelo commented 11 months ago

Now what? We're stuck on this old driver forever? Can't the SD devs fix it on this side instead of waiting for nVidia devs?

KrisadaFantasy commented 11 months ago

537.42 on 2060 and got this issue.

I just update from 531 without knowing about it. Normal generation is fine at first with about 1-3s/it. Then after I use face restore in Reactor (roop-forked) for the first time, the later generation is slow down as much as 60s/it. Disable anything back to pure generation won't solve the issue and need to restart the whole thing to liberate VRAM.

Edit: clean installed 531.61 and everything is working fine as before now.

Aaron2550 commented 11 months ago

Can't confirm this

RTX 4070 12GB
WebUI Version: v1.5.1-1-g56236dfd
Python: 3.10.11
Torch: 2.0.1+cu118
Xformers: N/A
Gradio: 3.32.0

Driver 531.79:
Prompt 1: 4.0, 4.1, 4.0, 4.0, 4.0 seconds - A: 1.78 GB, R: 3.22 GB, Sys: 5.4/11.9941 GB (45.4%)
Prompt 2: 15.1, 15.0, 15.0, 15.0, 15.0 seconds - A: 3.38 GB, R: 7.11 GB, Sys: 9.3/11.9941 GB (77.9%)

Driver 537.42:

Prompt 1: 4.0, 4.0, 3.9, 4.0, 3.9 seconds - A: 1.78 GB, R: 3.22 GB, Sys: 5.4/11.9941 GB (45.4%)
Prompt 2: 15.1, 14.9, 14.9, 15.0, 14.8 seconds - A: 3.38 GB, R: 7.11 GB, Sys: 9.4/11.9941 GB (78.4%)
opy188 commented 11 months ago

it's only noticeable to me with sdxl, where all models can generate quite fast, but with sdxl: it will slow down at random and stay as such, only resolving itself randomly then breaking again. I could only guess this is the driver thing.

Michoko92 commented 11 months ago

537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...

younyokel commented 11 months ago

537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...

I haven’t experienced the issue on RTX 2060 since version 537.42. Device-to-device? hmm

KrisadaFantasy commented 11 months ago

537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...

I haven’t experienced the issue on RTX 2060 since version 537.42. Device-to-device? hmm

Did you have issue and it is solved by updating to 537.42? I'm also on 2060 and got the issue by updating to 537.42!

Let's hope that 537.58 fixed it. I was planning to clean install my Windows this weekend after latest moment update any way. Might as well try new driver before that.

Michoko92 commented 11 months ago

537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...

I haven’t experienced the issue on RTX 2060 since version 537.42. Device-to-device? hmm

Interesting. So you had the issue with drivers between 531.x and 537.42, then it got solved with 537.42? I'm surprised nobody else reported improvements. However, it would be great news!

Edit: KrisadaFantasy was faster than me ;)

Romangelo commented 11 months ago

Actually, my issue disappeared with 537.42, too. I just didn't report it here.

I did use DDU to clean all the old drivers before installing 537.42, so that might be something you guys should try if you still keep having this issue.