Closed Gourieff closed 1 year ago
my guess is that you probably are already running very close to the edge and you just have one more stuff in the background that takes up the last straw of the vram you need
lots of stuff stuff such as web browser depends on how it's configured and if Hardware acceleration is on can use a some of your vram
also I would try using --medvram-sdxl
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings
me too, after update i can't run, Stable diffusion model failed to load
my guess is that you probably are already running very close to the edge and you just have one more stuff in the background that takes up the last straw of the vram you need
lots of stuff stuff such as web browser depends on how it's configured and if Hardware acceleration is on can use a some of your vram
also I would try using
--medvram-sdxl
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings
Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version
Also I noticed that the img generation process has become longer, it seems to be hanging on the last percents
A bunch of errors in the Console in different situations, e.g. when I stopping the server (ctrl+c) I get a lot of errors while its closing, 1.5.2. and previous versions never had anything like this
I've just made a clean install of 1.5.2, works perfectly, very fast and no errors
So guys, version 1.6.0 is still raw
My generations won't even start, it's hangs on 0% and does not progress. My fans start spinning like before but it's stuck at 0% forever
Having the same issue on my 3090 24GB with the latest pull. Could generate SDXL + Refiner without any issues but ever since the pull OOM-ing like crazy.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 24.00 GiB total capacity; 10.66 GiB already allocated; 10.70 GiB free; 10.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Same here Nvidia 1060 with 6Gb:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 5.92 GiB total capacity; 5.01 GiB already allocated; 77.75 MiB free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Back to 1.5.2
Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version
Ran into this issue when I loaded the base and refiner checkpoints into VRAM. Checking the 'Only keep one model on device' setting fixed the issue.
Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version
Ran into this issue when I loaded the base and refiner checkpoints into VRAM. Checking the 'Only keep one model on device' setting fixed the issue.
Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/
ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.
Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version
Ran into this issue when I loaded the base and refiner checkpoints into VRAM. Checking the 'Only keep one model on device' setting fixed the issue.
Thanks, I will try But it's not right I think even for my 6Gb GPU, not to mention GPUs with more VRAM, especially 24Gb - it's nonsense!
Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/
ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.
I have 16GB VRAM, however only 8 is shared, so it does appear to depend on how much memory your GPU is set up to share.
Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/ ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.
I have 16GB VRAM, however only 8 is shared, so it does appear to depend on how much memory your GPU is set up to share.
@wagontrader the card is dedicated only for SD. I run the igpu for other tasks and my display. I was doing up some reading and the PyTorch allocated amount seems to be the vram set aside for SD to use and this would explain the OOM. Do you have any idea how I can let just let SD use the whole VRAM?
There seems to be a PyTorch setting for
torch.cuda.set_per_process_memory_fraction()
but I have no idea how or where to set it for A1111
Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/ ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.
I have 16GB VRAM, however only 8 is shared, so it does appear to depend on how much memory your GPU is set up to share.
@wagontrader the card is dedicated only for SD. I run the igpu for other tasks and my display. I was doing up some reading and the PyTorch allocated amount seems to be the vram set aside for SD to use and this would explain the OOM. Do you have any idea how I can let just let SD use the whole VRAM?
There seems to be a PyTorch setting for
torch.cuda.set_per_process_memory_fraction()
but I have no idea how or where to set it for A1111
You can see in the Windows task manager how much A1111 uses during a processing:
The screenshots above are from https://allthings.how/how-to-check-vram-usage-on-windows-10/
@wagontrader the card is dedicated only for SD. I run the igpu for other tasks and my display. I was doing up some reading and the PyTorch allocated amount seems to be the vram set aside for SD to use and this would explain the OOM. Do you have any idea how I can let just let SD use the whole VRAM?
There seems to be a PyTorch setting for
torch.cuda.set_per_process_memory_fraction()
but I have no idea how or where to set it for A1111
@Gourieff It does appear to use dedicated, not shared VRAM. As far as I know, the only way to change the amount dedicated with the GPU is through the BIOS. I will give it a try and see.
edit: mine is in an expansion slot, so I can't change how it was set up. As I go further down the rabbit hole, it appears that I only have 8GB of VRAM and all 8 are being used. I will run some more tests and see if I can get any more insight.
@wagontrader thank you. Appreciate the reply.
This does just further prove my point. Only about 12GB of ram is being used and it then giving me OOM errors as soon as it goes over that. There is 10GB at least just sitting there doing nothing. I use the 24gig card only for SD the system is using its on GPU. Which is why I can't quite understand what is going on.
Okay good news. I managed to fix my issue. Here are the steps if anyone else needs it:
cudnn>libcudnn>bin
and copy all of them to > stable-diffusion-webui\venv\Lib\site-packages\torch\lib
and overwrite. I can load both the refiner and checkpoint again on my 24gb now and the pytorch allocation scales as needed. 8-10 seconds to generate 1080x1080. Hope this helps anyone else who has been stuck with this too!
@Acephalia I restarted my PC so that I could take a look at the BIOS, and after the reboot I took another look to see if I could figure anything else out. I set up the UI to load 2 checkpoints and set them both to be stored in VRAM (unchecked the option to only keep one model on the device). Ran some tests using XL Base and XL Refiner and everything is working as it should now.
A simple reboot did the trick for me.
@Acephalia I restarted my PC so that I could take a look at the BIOS, and after the reboot I took another look to see if I could figure anything else out. I set up the UI to load 2 checkpoints and set them both to be stored in VRAM (unchecked the option to only keep one model on the device). Ran some tests using XL Base and XL Refiner and everything is working as it should now.
A simple reboot did the trick for me.
@wagontrader thanks my friend! A reboot helped somehow 🤔 Also I purged the old venv and got it reinstalled before that (maybe it helped as well) I even try to run A1111 (5.3Gb model) and ComfyUI Portable (4Gb model) at the same time - working 🤷♂️
@Acephalia What version of torch did you have installed? Specifically what I'm wondering is, did/does it have "+cu118" appended to the name -- like 'torch-2.0.1+cu118'.? If not, installing torch yourself via
pip install torch==2.0.1+cu118 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
might work better for you. The cuDNN files from Pytorch are supposed to be optimized for Torch. The have the same version number as NVidia's (6.14.11.11080) but different file size and much more recent build date.
SDwebui's built in torch install command doesn't append '+cu118' and so installs a slightly different version if installing from scratch.
I set up the UI to load 2 checkpoints and set them both to be stored in VRAM
Hi - how did you do this?
@aleph23 Can’t quite remember what version of torch I had but all my issues are now fixed. Here are the full steps for anyone else who needs them.
@lookbothways Webui > Settings > Stable Diffusion It’s the first slider.
Is there an existing issue for this?
What happened?
Hi there I have RTX 3060 6Gb on my Laptop This GPU works great even with SDXL models (with some optimizations) Everything was fine before the today's update (2023-09-02) I use SD1.5 based model 5.3Gb, and till today there was no errors Today morning I ran a usual git pull to get the latest commits and after that SD WebUI is refusing to load the 5.3Gb model with a "CUDA out of memory" error
Version of NVIDIA driver - latest, 537.13
So! I've just tried to "git reset" to the commit
e7965a5e
- and all is fine now, the model loads with no errorsPlease check the latest commit, smth went wrong there
Steps to reproduce the problem
Launch web-ui with 6Gb NVIDIA GPU and try to load any model more than 5Gb with webui-user.bat:
What should have happened?
5.3Gb SD1.5 based model should have been loaded as before
Sysinfo
sysinfo-2023-09-02-08-42.txt
What browsers do you use to access the UI ?
Mozilla Firefox
Console logs
Additional information
No response