jacquesfeng123 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I have set up a server for my team to use.

Config is as below: { "sd_checkpoint_cache": 0, "sd_vae_checkpoint_cache": 0, } however every time I switch a model, RAM increases. It never gets down unless the webui is killed/restarted.

This is observed on linux only, not on my windows installation. In the end, I have to kill the Linux server every night.

I have upgraded to Torch 2.0.0, but the same thing was observed upgrading. Also I multi-instanced my gpu server to 4 webui instances. Same thing is also observed on my T4 (single instance).

we start with this, preloaded with a safetensor model A. please look at the second line, this number is going to change.

in the ui, we switched the model to model B.

now we switch it back to model A.

now we switch to model B again.

this keeps on happening even on the same models, so no need to prove further with other/more models. the issue is that this continues untill OOM, which then freezes the entire server. We already have 200 GB of RAM, including 100GB of swap. but sigh. It could be great if this can be solved.

this issue has been mentioned

2180, someone in #7451, this remained unanswered. also in #6532, it seems it has been fixed, but it really hasn't.

Steps to reproduce the problem

install in linux
install all required components
add in two or more models
switch between models
observe VRAM fly high

What should have happened?

with no cache in model, switchign models should not increase RAM.

Commit where the problem happens

python: 3.8.10 • torch: 2.0.0+cu118 • xformers: N/A • gradio: 3.22.0 • commit: faeef6fc • checkpoint: 4a408d2491

What platforms do you use to access the UI ?

Linux

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--api --listen

List of extensions

controlnet imagebrowser systeminfo

Console logs

no log error until OOM

Additional information

No response

Qhao6 commented 1 year ago

Me too, it always increases every time the model is switched, even if it is the one used before model. --xformers --opt-split-attention --no-half-vae --medvram.
So I added some parameters and found them to be of little use

dejl commented 1 year ago

I get this too.

"webui.sh" killed

when switching models every so often

AstralCupid commented 1 year ago

I get this too. For me it seems that roughly the full size of the model leaks into CPU RAM every time I switch models. Need to restart the python server frequently when switching models to prevent this. Reproduction is very consistent. Just switch models, generate one image, and switch models again.

Eventually, OOM will cause system instability eventually followed by webui.sh being killed.

manulsoftware commented 1 year ago

I am experiencing the exact same issue. Sometimes webui.sh gets killed after consuming all memory, sometimes my X session freezes and I have to reboot the entire thing.

Nyxeka commented 1 year ago

can confirm I get the same issue running docker desktop+WSL2, assign 14 gigs of ram, switch models a few times and observe ram go up until the container stops responding/crashes.

possible reasons?

WSL2 memory leak with pytorch stuff
torch isn't unloading models properly on linux
xformers is modifying the model or something to optimize it as it loads and this new model reference is never released? (only the old unoptimized one could be getting released?)
something up with new torch 2.0?

dejl commented 1 year ago

if it helps I'm running on Debian, not in a docker container Using torch 2.0.0 as well

python: 3.10.6 • torch: 2.0.0+cu118 • xformers: 0.0.18 • gradio: 3.23.0 • commit: [22bcc7be](https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/22bcc7be428c94e9408f589966c2040187245d81)

NamelessButler commented 1 year ago

Dunno if might help, but on collab I'm using....

wget -qq --show-progres http://launchpadlibrarian.net/367274644/libgoogle-perftools-dev_2.5-2.2ubuntu3_amd64.deb
wget -qq --show-progres https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/google-perftools_2.5-2.2ubuntu3_all.deb
wget -qq --show-progres https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/libtcmalloc-minimal4_2.5-2.2ubuntu3_amd64.deb
wget -qq --show-progres https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/libgoogle-perftools4_2.5-2.2ubuntu3_amd64.deb
apt install -qq libunwind8-dev
dpkg -i *.deb
rm *.deb
os.environ["LD_PRELOAD"] = "libtcmalloc.so"

This fixed Mem leak issues on colab. Maybe this can be used as reference.

Kadah commented 1 year ago

I think this might be the source of my recent memory leak problems. Killing webui doesn't free all the consume ram either. Started after upgrading forward in to the gradio update, was previously on https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/a9fed7c364061ae6efb37f797b6b522cb3cf7aa2

nntaoli commented 1 year ago

me too, ubuntu 22.04

prurigro commented 1 year ago

@falsonerd Thanks for sharing your solution, it worked perfectly for me in the webui! I added export LD_PRELOAD=/usr/lib/libtcmalloc.so to the bash script I use to run launch.py and now memory doesn't increase when I switch checkpoints. They also load a LOT faster.

Here's the full script I run to launch the webui from a virtual env:

#!/usr/bin/env bash

export LD_PRELOAD=/usr/lib/libtcmalloc.so

env VIRTUAL_ENV=/var/lib/sdwebui/stable-diffusion-webui/venv /var/lib/sdwebui/stable-diffusion-webui/venv/bin/python launch.py

For any Arch Linux users looking to apply this fix, /usr/lib/libtcmalloc.so is part of the gperftools package.

Kadah commented 1 year ago

Seems libtcmalloc does help.

Ubuntu 20.04: Install libtcmalloc-minimal4 via apt Add export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 to web-user.sh

Seeing about half as much bloating after swaping through a dozen checkpoint. Loading seem to be about the same (when switching to a checkpoint that had been recently loaded, even in a previous instance, it'll load faster due to OS disk caching).

I'll need to reboot to see if this resolves the permanent mem leak I'm seeing within the first couple hours of booting and running the webui.

jacquesfeng123 commented 1 year ago

Dunno if might help, but on collab I'm using....

wget -qq --show-progres http://launchpadlibrarian.net/367274644/libgoogle-perftools-dev_2.5-2.2ubuntu3_amd64.deb
wget -qq --show-progres https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/google-perftools_2.5-2.2ubuntu3_all.deb
wget -qq --show-progres https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/libtcmalloc-minimal4_2.5-2.2ubuntu3_amd64.deb
wget -qq --show-progres https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/libgoogle-perftools4_2.5-2.2ubuntu3_amd64.deb
apt install -qq libunwind8-dev
dpkg -i *.deb
rm *.deb
os.environ["LD_PRELOAD"] = "libtcmalloc.so"

This fixed Mem leak issues on colab. Maybe this can be used as reference.

this works, thanks!

gunjianpanxdd commented 1 year ago

I switch back and forth between the controllnet model, and the memory will continue to rise until it explodes。 I have already used libtcmalloc.so

salmon85 commented 1 year ago

having the same issue on mint 21, 32gb ram with a 8gb swap file system used to grind to a halt when it ate all my ram and swap

increased swap to 16gb thinking I hadn't set enough, it ate that too and caused a lock up had to ctrl alt backspace to kill my session.

just installed the libtcmalloc fix that Kadah mentioned earlier. seems to be ok at the moment. will report back in about an hour or two if my system locks up

pikatchu2k3 commented 1 year ago

The device is more stable now. I hade once a System freeze after switching models a lot. RAM Usage stays lower than before.

If you want to use this on Fedora 38 you have to:

sudo dnf install sudo dnf install gperftools-2.9.1-5.fc38.x86_64 (or your actual version that is available)
create in your stable-diffusion folder your custom batch file like Custom.sh and make it executable (right click on the file)
open the file with an editor and type

#!/usr/bin/env bash
python3.10 -m venv env
source env/bin/activate
export LD_PRELOAD=/usr/lib64/libtcmalloc.so
python launch.py --xformers --autolaunch --theme dark

notes: line: python3.10 -m venv env = needed for the correct python version line: export LD_PRELOAD=/usr/lib64/libtcmalloc.so = for loading the RAM fix line: python launch.py --xformers --autolaunch --theme dark = my setup with xfromers

Nyxeka commented 1 year ago

using libtcmalloc, I found the RAM usage goes down on its own over time, though the container can still build up and crash if you switch models quickly.

Kadah commented 1 year ago

Seems libtcmalloc does help.

Ubuntu 20.04: Install libtcmalloc-minimal4 via apt Add export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 to web-user.sh

Seeing about half as much bloating after swaping through a dozen checkpoint. Loading seem to be about the same (when switching to a checkpoint that had been recently loaded, even in a previous instance, it'll load faster due to OS disk caching).

I'll need to reboot to see if this resolves the permanent mem leak I'm seeing within the first couple hours of booting and running the webui.

Update: The leak from swapping models appears to be mostly fixed by using libtcmalloc, but I still have no clue on the cause of the mystery leak over time from just having it run idle. That one is worse as just restarting the webui does not free the mem, only rebooting will.

mack-w commented 1 year ago

Can confirm facing the same issue. Switching to libtcmalloc was a fix on Ubuntu 23.04. I would like to profile into which part of the server caused such a problem, anyone with a hint?

DutchComputerKid commented 1 year ago

The device is more stable now. I hade once a System freeze after switching models a lot. RAM Usage stays lower than before.

If you want to use this on Fedora 38 you have to:
1. sudo dnf install sudo dnf install gperftools-2.9.1-5.fc38.x86_64 (or your actual version that is available)

2. create in your stable-diffusion folder your custom batch file like Custom.sh and make it executable (right click on the file)

3. open the file with an editor and type
#!/usr/bin/env bash
python3.10 -m venv env
source env/bin/activate
export LD_PRELOAD=/usr/lib64/libtcmalloc.so
python launch.py --xformers --autolaunch --theme dark
notes: line: python3.10 -m venv env = needed for the correct python version line: export LD_PRELOAD=/usr/lib64/libtcmalloc.so = for loading the RAM fix line: python launch.py --xformers --autolaunch --theme dark = my setup with xfromers

For anyone using Debian 11:

sudo apt install google-perftools and/or sudo apt install libtcmalloc-minimal4 File locations are different, so this worked for me:

#!/bin/bash
#!/usr/bin/env bash
python3 -m venv env
source env/bin/activate
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
python3 launch.py --listen --no-half --medvram --upcast-sampling

And that should do the trick.

bigahega commented 1 year ago

using libtcmalloc, I found the RAM usage goes down on its own over time, though the container can still build up and crash if you switch models quickly.

Seconding this. tcmalloc does not fix the issue completely. If you switch models frequent enough, it crashes with an oom.

wangwenqiao666 commented 1 year ago

There are enough models out there that switching between models on both Windows and Linux can cause memory leaks.

mockinbirdy commented 1 year ago

Well, this is not about linux specifically, same thing happens on windows. It's painful actually, to restart webui every couple of minutes. Any fix? This problem was first discovered several updates ago and still here we are.

wangwenqiao666 commented 1 year ago

we can us su root command into root, then input echo 3 > /proc/sys/vm/drop_caches clear cache

MakingMadness commented 1 year ago

Same problem here. Some (but not all) issues that seem to be about the same problem:

dungeon1103 commented 1 year ago

No solution in sight for this problem?

hrkrx commented 1 year ago

it's even worse with sdxl, I have to restart the webui every two gens to not freeze the entire system...

mendhak commented 1 year ago

Just to echo that, with Stable Diffusion XL it's now common to switch between checkpoints. Once for base and once for refine. Doing so a few times, or switching to another checkpoint causes memory to shoot up and frequently get killed for out-of-memory.

My system - Ubuntu 22.04, 32GB RAM.
Launched automatic1111 with ./webui.sh --medvram

dmesg:

19253.424833] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-a6c45a68-500f-4286-9184-934514323b61.scope,task=python3,pid=159469,uid=1000 [19253.424897] Out of memory: Killed process 159469 (python3) total-vm:50746396kB, anon-rss:25113304kB, file-rss:71448kB, shmem-rss:16520kB, UID:1000 pgtables:64628kB oom_score_adj:0

lhw11 commented 1 year ago

I also encountered the same problem. Is there any way to solve it

towardmastered commented 1 year ago

Same on Windows I have a swap of 128gigs, which helps a bit, but still, it's shooting up to 80-90gigs easily

Nan-Do commented 1 year ago

Same happens to me, using libtcmalloc_minimal.so.4 on linux, it's not a major problem as long as I don't switch models, I haven't tried waiting a long time before switching models. As of right now it pretty much makes the workflow with the latest stable diffusion XL model (base + refiner) really hard to use.

duongnv0499 commented 11 months ago

same error when running on both colab pro (T4) and local linux machine (RTX 3090), doese anyone have solved it

tuxthepenguin84 commented 9 months ago

I believe this has resolved my OOM issues on Ubuntu.

Install libtcmalloc-minimal4 I then modified the custom systemd service I had and added Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4

mustofakamal1 commented 2 months ago

Looks like I've found the problem. My RAM usage now is quite stable and only have a big increase for a moment when loading the model (SSD > RAM > GPU).
Screenshot from 2024-07-15 07-23-01

Change the line 803 from: len(model_data.loaded_sd_models) > shared.opts.sd_checkpoints_limit > 0: to this: if len(model_data.loaded_sd_models) >= shared.opts.sd_checkpoints_limit > 0:

The problem with that code before is the condition never satisfied because the function count the number of current loaded model only which not include the new model, hence the unload function in line 805 - 806 not called. Also, it will going to the line 829 which looks like remove the old model from the loaded model data but somehow not removed it from memory like the unload function before.

The setting "Maximum number of checkpoints loaded at the same time" also tested working.

Tested on stable-diffusion-webui version 1.10.0-RC

magpie514 commented 2 months ago

@mustofakamal1 Thank you so much for figuring this out, this is a huge fix. Can you make a pull request for this? It might be able to get in before 1.10.0 final is released, and it's a simple one-liner with a huge benefit so it shouldn't find a lot of resistance to get merged.

MakingMadness commented 2 months ago

@mustofakamal1 I'd also like to extend thanks for your fix to a very annoying and very long lived bug! I can confirm it works.

mustofakamal1 commented 2 months ago

@magpie514 @MakingMadness You're welcome, happy to help. I will try create the PR.

mustofakamal1 commented 2 months ago

Looks like I've found the problem. My RAM usage now is quite stable and only have a big increase for a moment when loading the model (SSD > RAM > GPU).

Change the line 803 from: len(model_data.loaded_sd_models) > shared.opts.sd_checkpoints_limit > 0: to this: if len(model_data.loaded_sd_models) >= shared.opts.sd_checkpoints_limit > 0:

The problem with that code before is the condition never satisfied because the function count the number of current loaded model only which not include the new model, hence the unload function in line 805 - 806 not called. Also, it will going to the line 829 which looks like remove the old model from the loaded model data but somehow not removed it from memory like the unload function before.

The setting "Maximum number of checkpoints loaded at the same time" also tested working.

Tested on stable-diffusion-webui version 1.10.0-RC

An update: it only works for highvram uses, using lowvram/medvram may throw an error. Also, it's just a quick fix, the actual problem still exist and can be read in here.

viking1304 commented 2 months ago

@mustofakamal1 what is your torch version? I remember someone had problems with leaked memory on a Mac with torch 2.0.1 or 2.0.2. If I remember correctly, the problem disappeared after upgrading to 2.1.2. It might not be related to your problem, but I would try torch 2.3.1

mustofakamal1 commented 2 months ago

Already used the latest 2.3.1

AUTOMATIC1111 / stable-diffusion-webui

[Bug]: linux memory leak when switching models #9323

Is there an existing issue for this?

What happened?

2180, someone in #7451, this remained unanswered. also in #6532, it seems it has been fixed, but it really hasn't.

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information