Open ifeelrobbed opened 2 years ago
Memory leak maybe?
From /var/log/syslog:
Oct 16 12:28:53 pop-os kernel: [69600.171513] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-36.scope,task=python3.10,pid=48807,uid=1000 Oct 16 12:28:53 pop-os kernel: [69600.171634] Out of memory: Killed process 48807 (python3.10) total-vm:32734412kB, anon-rss:13482224kB, file-rss:65752kB, shmem-rss:14340kB, UID:1000 pgtables:37700kB oom_score_adj:0 Oct 16 12:28:55 pop-os systemd[1]: session-36.scope: A process of this unit has been killed by the OOM killer
.
Watching memory climb as I run it. Form restart to crash, with a little of the middle not in the screenshots.
I had the same problem after update (https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2782). Restarted the computer and it seems to work fine. Try to restart!
(I have 32GB of memory and I don't think memory is a problem, never hit the limit).
Same issue here since about 2 days. Running native on Ubuntu. Sometimes the whole PC freezes completely and I have to hard reset. Sometimes it freezes for up to 40sec and at times when I keep the console as active window I get the same error output as yours and can restart the webui. 32GB Ram, i5-12600k, RX 6650 XT.
Edit: It has either been fixed or is related to "Radeon Profile" on Linux. No freezes since my last restart without radeon profile active. Edit2: Spoke one second before disaster. PC crashed again after the run after the first edit. Not Radeon Profile related and not fixed yet.
Unfortunately rebooting didn't seem to change anything.
Unfortunately rebooting didn't seem to change anything.
Did you update Gradio and other stuff? Seems recent updates require new versions of libraries. pip install pip-upgrader
and then pip-upgrade
, it will update python dependancies from new requirements.txt
)
Unfortunately rebooting didn't seem to change anything.
Did you update Gradio and other stuff? Seems recent updates require new versions of libraries.
pip install pip-upgrader
and thenpip-upgrade
, it will update python dependancies from newrequirements.txt
)
Went through those steps. Gradio was already up to date. It did update 3 others: fairscale,timm,transformers
Still maxed out memory and was killed.
possibly a memory leak, as for prevention I need to create a dynamic swapfiles up to 10GB on my system
Same here, there is some memory leak, probably introduced day 14-16, older commits don't have that issue.
The memory increases right after the generation of the batches starts, keeps the same memory usage during the generation, and increases again when starts the next batches by clicking in the generate button.
Yes, same problem for me, it can eat up ~1gb of ram per generation, which is never returned to the system, so a lot of shutting down Stable Diffusion and restarting it in order to reclaim said ram becomes a necessity.
Running on a RTX 3060 12gb, 32gb ram, Arch Linux
Dang. Got excited when I saw the commit fix bug for latest model merge RAM improvement
However, I still maxed out memory, swap, and the process was killed after ~6 minutes.
Having the same issue, after some time generating the process wil die with "webui.sh:; line 141 (pid) killed"
syslog:
Oct 18 21:43:09 DraxPC kernel: [ 5364.528101] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-688059d0-5100-4a81-983d-15c959b6b48a.scope,task=python3,pid=5445,uid=1000 Oct 18 21:43:09 DraxPC kernel: [ 5364.528182] Out of memory: Killed process 5445 (python3) total-vm:30828808kB, anon-rss:13574820kB, file-rss:70656kB, shmem-rss:14340kB, UID:1000 pgtables:35620kB oom_score_adj:0 Oct 18 21:43:09 DraxPC systemd[1]: user@1000.service: A process of this unit has been killed by the OOM killer. Oct 18 21:43:09 DraxPC systemd[1163]: vte-spawn-688059d0-5100-4a81-983d-15c959b6b48a.scope: A process of this unit has been killed by the OOM killer.
Ryzen 5600x 16gb ram GTX 1650 4gb vram Linux mint 21.whatever
I found the problem, it is the gradio 3.5, the leak starts in the commit 4ed99d599640bb86bc793aa3cbed31c6d0bd6957, downgrading the gradio back to 3.4.1 solves the leak, I don't know what other changes was made because of gradio 3.5 that can break by downgrading but it is working good for me with the downgrading so far.
What do you think @AUTOMATIC1111 can you check it out?
How would one go about downgrading gradio for the time being?
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
Interesting. I'm using the following arguments:
--medvram --opt-split-attention --force-enable-xformers
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.
hm but when i start (and on the first generations) i have quite a lot of free ram (abt 6gb plus 10swap), for every image generated it takes a little bit and after like 50 images it fills, if it didn't leak it should stay around the same ram usage and not build up over time
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.
hm but when i start (and on the first generations) i have quite a lot of free ram (abt 6gb plus 10swap), for every image generated it takes a little bit and after like 50 images it fills, if it didn't leak it should stay around the same ram usage and not build up over time
@leandrodreamer as I said previously (https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2858#issuecomment-1283381801) I found it is because the gradio upgrade and downgrading it removes the leak.
I didn't say there is no leak, I said that by using lowvram/medvram you will use more RAM than without it, so the increase in memory due to lowvram/medvram is not a leak, it is supposed to happen.
I didn't identify any leak related to those options.
oh got it, what i find strange is that i don't have any leak problems without the --medvram param, i can make hundreds of images no problem (without downgrading gardio), maybe a mix of the new gardio version and that param?, or maybe i have a completly diferent problem here idk :b
@leandrodreamer yes it may be a mix of settings, you can try to revert the commit 4ed99d599640bb86bc793aa3cbed31c6d0bd6957 to test if your problem is the same I identified or something else.
I just deactivated and deleted venv, reverted to 7d6042b908c064774ee10961309d396eabdc6c4a, which is the last commit before Gradio 3.5, commented out the line in webui.sh that performs git pull
and let it just reinstall everything. Memory usage is steady and I am generating images just fine again.
Alright, after a day of no issues, I performed a git pull, modified requirements.txt and requirements_versions.txt back to gradio==3.4.1, and commented out the git pull line in webui.sh. So far so good. The only change from the latest commit should be the gradio downgrade and memory usage is steady.
Havent had issues since yesterday evening, seems to be fixed.
Still had the same problem, nothing changed after latest git pull. Decided to reinstall from scratch, and lo and behold, no more memory leaks.
Still had the same problem, nothing changed after latest git pull. Decided to reinstall from scratch, and lo and behold, no more memory leaks.
Sadly it didn't worked for me, I reinstalled everything and the leak persists with the last master commit.
Ok, so I ran automatic1111 through this docker image: https://github.com/AbdBarho/stable-diffusion-webui-docker
And it had the same problem for me, eating ram. So I went back to compare my previous installation of automatic1111 (I backed it up when I reinstalled) and the only difference was that in webui-user.sh, I had the --medvram parameter
So I edited the docker-compose.yml in the docker image and removed --medvram, and now there are no more leaks, so I added --medvram to my reinstalled local version and it leaks memory again. So for me, just like leandrodreamer stated in this thread, --medvram is the culprit.
Now I have 12gb VRAM, so not being able to use --medvram isn't that much of a problem, but for those with less VRAM, not being able to use it might be a pain or even make it impossible to run ?
Yeah, with my 2060 I have to use --medvram for it to work at all. The only way I've found to prevent the memory leak regardless of commit I revert to is to force Gradio 3.4.1.
Same thing happening to me. Manually downgrading gradio to 3.4.1 via pip seems to fix this problem.
Running in docker on linux, 32gb system ram, rx580 4gb.
Is this an issue in gradio (upstream) or an issue with how this repo uses gradio?
Downgrading gradio apparently fixes the issue, which strongly suggests that the issue is upstream.
did some further testing and this commit to gradio causes the leak: https://github.com/gradio-app/gradio/commit/a36dcb59750b1f4cd7e66d3b39ba0621ee89183b
Edit: I even tested running without --medvram with latest gradio and observed no leak, so the cause is --medvram option combined with https://github.com/gradio-app/gradio/commit/a36dcb59750b1f4cd7e66d3b39ba0621ee89183b or later.
still happening as of d61f0ded24b2b0c69b1b85293d6f71e4de0d1c63
Yeah I was hoping Gradio 3.8 with 17087e306d4f888b67037a528bc4cf161995e1c4 would work, but still have the same issue.
Downgraded to 3.4.1 and am back in business.
Still happening on 828438b. I can only generate like 5 images before it crashes if I'm using --medvram on my AMD card. I'm using COMMANDLINE_ARGS="--listen --precision full --no-half --medvram"
I also need to use "export HSA_OVERRIDE_GFX_VERSION=10.3.0" for pytorch to work (I'm using a 5700XT, 8GB vram)
With medvram I get around 3it/s and the process gets killed, but without it I only get 1.3it/s and my monitors sometimes disconnect lol, so medvram working correctly is super important.
Yeah I was hoping Gradio 3.8 with 17087e3 would work, but still have the same issue.
Downgraded to 3.4.1 and am back in business.
Yeah I also solved it with same solutions. Thanks. https://hjlabs.in
gradio to 3.4.1 for ubuntu system downgrade, please
是的,我希望带有 3.8e8 的 Gradio 17087 可以工作,但仍然存在相同的问题。 降级到 3.4.1 并重新开始营业。
是的,我也用相同的解决方案解决了它。谢谢。https://hjlabs.in
gradio to 3.4.1 for ubuntu system downgrade, please
I've been generating a lot of X/Y plots in a single session, and I don't feel it's leaking memory in that use case. Possibly of note is that the selected Y axis is Checkpoint name.
I've not tried switching gradio version yet.
Haven't downgraded gradio yet but been experiencing this as well. 3050 runs okay without it but I don't think I can try training hypernetworks without --medvram so the leak is annoying.
Sorry, for the close, lag made me touch the close button.
I downgraded to gradio 1.4.1 and triple checked it is still 1.4.1 but this memory usage is still maxed, even after all jobs are completed...
Gradio 3.16.2 still has a RAM leak when using --medvram
This is frustrating because I don't know if it's going to get fixed
change gradio==[versionnumber]
to 3.4.1
in requirements.txt
pip install -r requirements.txt
profit
3.4.1 is incompatible but also extremely laggy.
yup. it lags a lot, but it fixed the problem for me to some extent
3.4.1 不兼容,暂时去掉了--medvram,没有发现泄漏
[I am posting this in multiple places; it seems to be a common issue] I have had a similar problem, and solved it. Apparently, permanently. Here's what I think is going on: the websockets layer between A1111 and SD is losing a message and hanging waiting for a response from the other side. It appears to be a result of when there is a lot of data going back and forth, possibly overrunning a queue someplace. If you think about it, A1111 and SD are shovelling big amounts of image data across the websockets. And here's how you exacerbate it: tell A1111 to display each image as its created, then set a "new image display time" down around 200ms. If you do that, it'll start failing pretty predictably, at random. How to fix: have it display the image every 30 iterations and set the display time at around 10 seconds. Poof. Problem gone. [This problem resembles a bug in Sun RPC from back around 1986; plus ca change...]
This problem still exists, removing --medvram
stopped the memory leak when generating images, switching between checkpoint does seem to do the same thing though. After switching the ram spikes but doesn't go back down.
can confirm this is the case, with --medvram sdwebui gradually consumes 31.8GB memory and get killed for OOM in dmesg. I was doing sdxl image generation with refining on. adding --lowram does not mitigate this issue.
This problem still exists, removing
--medvram
stopped the memory leak when generating images, switching between checkpoint does seem to do the same thing though. After switching the ram spikes but doesn't go back down.
same here, i was doing sdxl with refiner, the program quickly get killed for OOM as it switches between the base and refiner model.
Describe the bug Consistently hangs after 6-7 minutes since yesterday (10/15). Hopping on the command line the process is shown as killed. This happens both starting with webui.sh and launch.py.
To Reproduce Steps to reproduce the behavior: 6 - 7 minutes of activity in the web UI. The UI hangs and eventually the process is killed.
Expected behavior Not hang?
Screenshots
Desktop (please complete the following information):