Closed w-e-w closed 10 months ago
Funny, I am randomly getting the issues where an output is stuck at 50% for an hour, and I am on 531.41 for an NVIDIIA 3060 12GB model
Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.
I can confirm this bug. I was getting results (as expected) before I installed the latest Titan RTX drivers. I will try installing a previous build.
Strangely mine seems to go at normal speed for the first gen on a checkpoint, or if I change the clip on a checkpoint, but subsequent gens go muuuch slower. Annoyingly Diablo won't run on 531.
Yeah, that's exactly how it is for me. When I tried inpainting, the first gen runs through just fine, but any subsequent ones have massive hang-ups, necessitating a restart of the commandline window and rerunning webui-user.bat.
I wasn't sure if there was a problem with the drivers so I reinstalled WebUI, but the problem didn't go away. To think, everything generates fine like before, but once the High Res Fix starts and finishes, it looks like a minute pause. Edit: confirming. Downgraded to 531.68. Now everything as it was
If you are stuck with a newer Nvidia driver version, downgrading to Torch 1.13.1 seems to work too.
webui-user.bat
:
set TORCH_COMMAND=pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
<webui-root>/venv
directoryI am having the opposite issue where on the newer drivers my first image generation is slow because of some clogged memory on my GPU which frees itself as soon as it gets to the second one.
Downgrading Torch didn't seem to help at all. Downgrading from 536.23 to 531.79 fixes the problem instantly.
Anyone, is this problem still relevant?
I haven't tried with the latest drivers, so I don't know if this issue is still ongoing.
Extremely slow for me. Downgraded the pytorch, and had a whole lotta of new problems. What usually took 4h is taking 10+
Please tell me there is a fix in the pipeline?
For pro graphics (at least for my A4000) 531 is not going to help with eliminating the issue. Need to downgrade to at least 529 to get rid of the shared memory usage. And both 529 / 531 / 535 / 536 in production brunch are working way worse, than 531 at new feature (uses shared VRAM, but way smaller footprint for some reason)
Can confirm this is still an issue, I have a RTX 3080 TI and downgrading to 531.68 solved it for me.
I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.
It got a tiny bit better here. torch 1.13.1+cu117. 531.79. Cuda compilation tools release 12.0, V12.0.76
Still having issues with the duration of the generations. Usually, 200 frames took 4h, and now it is taking 10 (720x1280, 30 steps, 2~3 controlNets). Don't know how to fix it properly. Every other fix I did, severely damaged the quality of the images. I now know that I was using the 1.2.1 version of the webUI and the torch was not 2.0. Every other setting I do not remember. Now I have everything written somewhere hahahah
Em dom., 9 de jul. de 2023 às 11:10, Detrian @.***> escreveu:
I'm using a 3070, torch: 2.0.1+cu118, and can confirm that this is still an issue with the 536.40 driver. Using highres.fix in particular makes everything break once you reach 98% progress on an image.
— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1627668465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXED22T44Y6R7JGQJVB5LDXPJ7RZANCNFSM6AAAAAAY4U6YGE . You are receiving this because you commented.Message ID: @.***>
536.67 fixed this? or not?
I did not try it. A lot of wasted time already hahjaja
Em qua., 19 de jul. de 2023 às 00:12, dajusha @.***> escreveu:
536.67 fixed this? or not?
— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1641104904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXED2ZHD4ROQFAXGYNLTKLXQ4J7HANCNFSM6AAAAAAY4U6YGE . You are receiving this because you commented.Message ID: @.***>
536.67 fixed it for me.
536.67 also worked for me somewhat, meaning it still seems to drop to shared memory but not as aggressively (latest versions seem to start using shared memory at 10GB rather than fully maxing out all available 12GB which matters.
The 536.67 driver release notes still references shared memory, and I recently started getting the "hanging at 50% bug" again today after updating some plugins which prompted me to dig a bit deeper for some solutions.
I often use 2 or 3 ControlNet 1.1 models + Hi-res Fix upscaling on a 12GB card which is what triggers it if I watch my Performance tab and see the GPU begin to use shared CPU memory.
The ideal fix would be finding some way to create a --never-use-migraine-inducing-shared-memory
flag, but I assume this would rely on some driver or operating system API to become available after some light research as there doesn't seem to be a way to "block" a specific process from using shared memory.
However, for the good news - I was able to massively reduce this >12GB memory usage without resorting to --medvram
with the following steps:
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512
to webui-user.bat
--xformers
and --opt-split-attention
.Assuming your environment already looks similar to the above, by far the biggest VRAM drop I found was switching from the 1.4GB unpruned .pth
ControlNet 1.1 models to these 750MB pruned .safetensors
versions https://civitai.com/models/38784
Hope this helps anyone in a similar frustrating position 😁
From my understanding ComfyUI might've done something with CUDA's malloc to fix this. https://github.com/comfyanonymous/ComfyUI/commit/1679abd86d944521cad8a94a09d30fd5e238ae22
Looks like a lot of cards also don't support this though: https://github.com/search?q=repo%3Acomfyanonymous%2FComfyUI+malloc&type=commits&s=author-date&o=desc
536.67 also did not fix this, according to the release notes.
https://us.download.nvidia.com/Windows/536.67/536.67-win11-win10-release-notes.pdf
3.2 Open Issues in Version 536.67 WHQL
This driver implements a fix for creative application stability issues seen during heavy memory usage. We’ve observed some situations where this fix has resulted in performance degradation when running Stable Diffusion and DaVinci Resolve. This will be addressed in an upcoming driver release. [4172676]
I updated the drivers without thinking this might happen and now I can't go back. I have tried removing the drivers with "Display Driver Uninstaller" and then installing v531.68 and v528.49 , but it still doesn't go as fast as before. RTX 4080 (Laptop) 12GB. I seem to be missing something.
Edit: finally my problem seems to be with the laptop itself. Yesterday I was testing 536.67 and 536.99 on my desktop using RTX 3080 with no problems.
After downgrading to 531.79 i noticed that it was using very slightly less ram, but was slower. So i downgraded to 531.18 but cant see any difference to 536.67 other then aforementioned less ram usage.
win10 latest, sd-webui 1.5.1, model: sdxl 1.0, image size 1024x1024
my experience yesterday with nvidia 531.79 gen 4 images under a minute on a 3090
my experience today with nvidia 536.57 1 image :23 sec 2 images 8 minutes 3 images 8 minutes 4 images 8 minutes
going to uninstall 536.57 and install 531.79
536.99 just released, with the open issue mentioned prior still there, but the mention of Stable Diffusion seemingly vanished. (It was given a reference number of 4172676 as mentioned here)
https://us.download.nvidia.com/Windows/536.99/536.99-win11-win10-release-notes.pdf
3.2 Open Issues in Version 536.99 WHQL
[DaVinci Resolve] This driver implements a fix for creative application stability issues seen during heavy memory usage. We’ve observed some situations where this fix has resulted in performance degradation when running DaVinci Resolve. This will be addressed in an upcoming driver release. [4172676]
Has anyone tried 536.99?
I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic. 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.
Has anyone tried 536.99?
I just installed 536.99 using RTX 3080 and so far it's working fine.
I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic, 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.
you mention 'rolling back to 531'. nvidia advises against using roll back. they say to uninstall the updated driver, and then download and install the old driver.
win10 latest, sd-webui 1.5.1, model: sdxl 1.0, image size 1024x1024
my experience yesterday with nvidia 531.79 gen 4 images under a minute on a 3090
my experience today with nvidia 536.57 1 image :23 sec 2 images 8 minutes 3 images 8 minutes 4 images 8 minutes
going to uninstall 536.57 and install 531.79
after uninstalling 536 and installing 531 I am back to the speeds I had before
I have tried 536.99. It has the same higher vram usage. All i can say, since i dont share your people's issues with newer versions. After rolling back to 531 i am getting the freezing but its pretty sporadic. 536 works just fine, i do xyz plots of several hundred generations. I have a rtx 4070 ti.
you mention 'rolling back to 531'. nvidia advises against using roll back. they say to uninstall the updated driver, and then download and install the old driver.
i searched up the versions on nvidia, and downloaded the older versions. I still decided to go with 536.99 fyi
I thought it's all fine with 536.99 on my RTX 3050 (mobile GPU) but uninstalling it and installing 531.61 studio edition, it made iterations 20% faster in ComfyUI, which is not a lot, but very noticeable.
Discussed in #11062
Originally posted by w-e-w June 7, 2023 some users have reported some issues related to the latest Nvidia drivers nVidia drivers change in memory management vladmandic#1285 #11050 (comment) if you have been experiencing generation slowdowns or getting stuck, consider downgrading to driver version 531 or below NVIDIA Driver Downloads
ya, been wondering why- the renders are now like 10x slower, will check older version of drivers. Edit1: downgraded to 536.67-desktop-win10-win11-64bit-international-dch-whql - seems to be much better - but I am sure if downgrading more would be better - I have 2070 rtx Edit2: after 5 renders things are back to "very bad" - rendering times for 512 image are back to 5min :(
Edit 3: went way back to "531.68-desktop-win10-win11-64bit-international-dch-whql" - and things seems back to normal - so "super fast" in comparison
Edit 4: now after 10 renders and 2x img2img upscales, I get cuda memory error - no matter what I do. Strange.
I can't go back to a previous driver as I wouldn't be able to train sdxl. At the start, there's a huge spike in vram usage to about 14GB and goes back down to 7GB during training. I have 11GB vram, so the new drivers let get past that initial stage. With the old drivers, I get oom.
When it saves state, it doubles vram usage to 17GB and never releases it. From this point on, it's using shared vram and slows down drastically.
Why does it need to use vram to save? Can't that all be done on CPU? If saving didn't use 10GB extra vram, I could train sdxl with using only 7GB vram.
Installing 536.99 does not help,
--no-half makes it very slow
--medvram --disable-nan-check does not do anything except giving me an empty image.
I am using 3080 TI, is there anything else I can try :( Something must happened within last couple weeks because this was working fine :(
Installing 536.99 does not help,
--no-half makes it very slow
--medvram --disable-nan-check does not do anything except giving me an empty image.
I am using 3080 TI, is there anything else I can try :( Something must happened within last couple weeks because this was working fine :(
I went back to 531 and reinstalled sd 1.5.1 from scratch. Faster, but definitely not as fast as before installing 536.
Installing 536.99 does not help,
--no-half makes it very slow
--medvram --disable-nan-check does not do anything except giving me an empty image.
I am using 3080 TI, is there anything else I can try :( Something must happened within last couple weeks because this was working fine :(
"--disable-nan-check" only disables the warning, If things are working correctly, should only be a problem when training with wrong settings, or images with alpha on a model that dont take alpha.
a1111 args list says "Do not check if produced images/latent spaces have nans; useful for running without a checkpoint in CI." Not sure what "Cl" is.
537.13 still hasn't fix anything.
downgrade driver to 531.68, use torch 1.13.1 with cu117, but still get CUDA out of memory error. Does anyone has the same issue? v1.6.0 and v1.5.2 both have this issue.
EDIT: RTX 4070 12gb
537.13 still hasn't fix anything.
Just updated, at first sight I don't notice any significant performance issues.
Tried 537.42
Absolutely not yet fixed. Takes me 3-4 minutes to generate a single 1920x1280 image in this driver.
Back to 531.68, and it's 20 second generation times.
Now what? We're stuck on this old driver forever? Can't the SD devs fix it on this side instead of waiting for nVidia devs?
537.42 on 2060 and got this issue.
I just update from 531 without knowing about it. Normal generation is fine at first with about 1-3s/it. Then after I use face restore in Reactor (roop-forked) for the first time, the later generation is slow down as much as 60s/it. Disable anything back to pure generation won't solve the issue and need to restart the whole thing to liberate VRAM.
Edit: clean installed 531.61 and everything is working fine as before now.
Can't confirm this
RTX 4070 12GB
WebUI Version: v1.5.1-1-g56236dfd
Python: 3.10.11
Torch: 2.0.1+cu118
Xformers: N/A
Gradio: 3.32.0
Driver 531.79:
Prompt 1: 4.0, 4.1, 4.0, 4.0, 4.0 seconds - A: 1.78 GB, R: 3.22 GB, Sys: 5.4/11.9941 GB (45.4%)
Prompt 2: 15.1, 15.0, 15.0, 15.0, 15.0 seconds - A: 3.38 GB, R: 7.11 GB, Sys: 9.3/11.9941 GB (77.9%)
Driver 537.42:
Prompt 1: 4.0, 4.0, 3.9, 4.0, 3.9 seconds - A: 1.78 GB, R: 3.22 GB, Sys: 5.4/11.9941 GB (45.4%)
Prompt 2: 15.1, 14.9, 14.9, 15.0, 14.8 seconds - A: 3.38 GB, R: 7.11 GB, Sys: 9.4/11.9941 GB (78.4%)
it's only noticeable to me with sdxl, where all models can generate quite fast, but with sdxl: it will slow down at random and stay as such, only resolving itself randomly then breaking again. I could only guess this is the driver thing.
537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...
537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...
I haven’t experienced the issue on RTX 2060 since version 537.42. Device-to-device? hmm
537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...
I haven’t experienced the issue on RTX 2060 since version 537.42. Device-to-device? hmm
Did you have issue and it is solved by updating to 537.42? I'm also on 2060 and got the issue by updating to 537.42!
Let's hope that 537.58 fixed it. I was planning to clean install my Windows this weekend after latest moment update any way. Might as well try new driver before that.
537.58 got released, and the slowdowns have disappeared from the "known issues" in the release notes. So either they fixed it (which is a bit doubtful), or they simply gave up, which is pretty concerning...
I haven’t experienced the issue on RTX 2060 since version 537.42. Device-to-device? hmm
Interesting. So you had the issue with drivers between 531.x and 537.42, then it got solved with 537.42? I'm surprised nobody else reported improvements. However, it would be great news!
Edit: KrisadaFantasy was faster than me ;)
Actually, my issue disappeared with 537.42, too. I just didn't report it here.
I did use DDU to clean all the old drivers before installing 537.42, so that might be something you guys should try if you still keep having this issue.
Update (2023-10-31)
This issue should now be entirely resolved. NVIDIA has made a help article to disable the system memory fallback behavior. Please upgrade to the latest driver (546.01 or newer) and follow the guide on their website: https://nvidia.custhelp.com/app/answers/detail/a_id/5490
Update (2023-10-19)
issue has been reopened as it seems like more and more reports are saying that the issue is not yet fixed
Update (2023-10-17)
there seems to be some reports saying that the issue is still not fixed
comments
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1764127635 https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063#issuecomment-1764899003
Update (2023-10-14)
This issue has reportedly been fixed by NVIDIA as of 537.58 (537.42 if using Studio release). Please update your drivers to this version or later.
The original issue description follows.
Discussed in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/11062
This issue will be closed when NVIDIA resolves this issue. It currently has a tracking number of [4172676]