[Bug]: Constant hanging/freezing on image generation

Ziehn commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

After the recent commits from the 25th onward, I get constant hangups/freezing on image generation, no errors in the log. it/s will be absurdly high when it unfreezes at 20s/it+

Rolling back to a stable version on the 24th fixes all freezing

GTX 1080 TI

Steps to reproduce the problem

Generate image

Watch as it freezes

Get image 2 mins later than usual

What should have happened?

Smooth and timely image generation

Commit where the problem happens

955df77

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

--xformers --no-half-vae

List of extensions

Happens with or without extensions

Stable-Diffusion-Webui-Civitai-Helper a1111-sd-webui-tagcomplete sd-dynamic-prompts sd-dynamic-thresholding sd-webui-controlnet stable-diffusion-webui-composable-lora stable-diffusion-webui-images-browser

Console logs

venv "D:\Stable Diffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
Commit hash: 955df7751eef11bb7697e2d77f6b8a6226b21e13
Installing requirements for Web UI
Installing sd-dynamic-prompts requirements.txt

Launching Web UI with arguments: --xformers --no-half-vae
D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Civitai Helper: Get Custom Model Folder
Civitai Helper: Load setting from: D:\Stable Diffusion\stable-diffusion-webui\extensions\Stable-Diffusion-Webui-Civitai-Helper\setting.json
Civitai Helper: No setting file, use default
Loading weights [75bcab05df] from D:\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\Z-Mix2.2.safetensors
Creating model from config: D:\Stable Diffusion\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: D:\Stable Diffusion\stable-diffusion-webui\models\VAE\Anything-V3.0.vae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(31): abcdef_mirajane, advntr, albino_style, aurate, charturnerv2, corneo_bowsette, easynegative, ng_deepnegative_v1_75t, pastel_style, RebeccaEdgerunners, rem_rezero, was-battletech, yoko v1
Model loaded in 4.4s (load weights from disk: 0.4s, create model: 0.4s, apply weights to model: 0.7s, apply half(): 0.6s, load VAE: 0.6s, move model to device: 0.7s, load textual inversion embeddings: 1.0s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 14.4s (import torch: 2.3s, import gradio: 1.1s, import ldm: 0.3s, other imports: 1.1s, load scripts: 2.3s, load SD checkpoint: 4.5s, create ui: 2.6s, gradio launch: 0.1s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:14<00:00,  3.73s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:27<00:00,  1.35s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:27<00:00,  1.32s/it]

Additional information

No response

chifeisoong commented 1 year ago

same issue here +1

Ziehn commented 1 year ago

Starting to look like some sort of incompatibility was introduced with Firefox. Been testing with Edge and not getting any freezing so far

same issue here +1

Which web browser are you using?

oderwat commented 1 year ago

It freezes for me on Opera OSX. I will try Google Chrome. But I also rolled back to a commit from a week ago already. Never had this before, but I am not sure which commit I was using the last weeks (I just updated yesterday and everything broke). master is completely unusable for me right now. It will hang loading the GUIs.

Edit: I can confirm that loading master with Google Chrome (and Edge) actually works. So for me the incompatibility with the current master has to do with Opera. I did not check if image generation still hangs.

Edit2: Still hangs randomly after generating some images (in Chrome). Will test Edge next.

Edit3: It actually does not seem to hang when using with Edge. Wow... Feels like in the early 2000s with every Browser working different for any given HTML.

Edit4: Well, now it hung with Edge too. Just seems to happen much more seldom when using it.

notkmikiya commented 1 year ago

I would try disabling all extensions that aren't built-in first.

I know that sd-dynamic-prompts freezes my browser right now. Using the base Stable Diffusion install without extensions seems to work fine. If that works for you, I'd then slowly enable extensions at a time to isolate the issue.

Ziehn commented 1 year ago

Already tested without extensions, first thing I tried, no dice.

notkmikiya commented 1 year ago

Already tested without extensions, first thing I tried, no dice.

@Ziehn That's pretty rough, not sure what's causing that. Maybe there's something that can be picked up in the browser dev tools console log? I saw some people having browser issues doing this to help #9027

Also, it looks like you're using torch: 2.0.0+cu118 and calling --xformers. Did you recompile xformers onto the new pytorch?

I have a friend with a 1080 TI and using --xformers seemed to hurt performance more than help it. Instead, using --opt-sdp-attention or --opt-sdp-no-mem-attention gave a decent performance boost of about 1it/s for him.

Ziehn commented 1 year ago

@notkmikiya Tried both of your suggested Arguments, neither did anything with nor without xformers. I can also confirm I get slower generation without xformers

As well, generation speeds back up when closing the afflicted web browser. No change in speed closing Edge.

oderwat commented 1 year ago

I could find a backup of the version I used for a long time before everything started to fall apart and be back on a9fed7c364061ae6efb37f797b6b522cb3cf7aa2, it works using all the code + venv from the backup as also with my newest venv (torch 2) and just running that commit. I guess I check occasionally if stuff works, but that old version can do all I want.

notkmikiya commented 1 year ago

This issue may have something to do with --no-half-vae on the recent updates. My friend had this same issue today after placing that into his webui-user.bat. After removing it, it went away.

thot-experiment commented 1 year ago

I have the same issue as of a recent commit, and I am not using --no-half-vae happens on both GV100 and 1080Ti, using torch2 and xformers, will try falling back to the commit mentioned by @oderwat

update: unfortunately I am still getting the same issue, generation will just hang with no message, issue happens regardless of browser used (tried Chrome and Firefox)

update2: seems this is an issue with previews? i have not had the error since disabling previews, frustrating, but works for now, will try bumping to the latest commit again and see if everything keeps working

update3: no issues on the newest commit with previews disabled

sliftist commented 1 year ago

@thot-experiment seems to have found the issue. Something to do with live previews is causing SD to infinitely loop (100% gpu usage, never finishes).

It also only seems to happens with torch 2.0.0+cu118 and --opt-sdp-no-mem-attention. My GPU is a 4070ti.

I was running live previews with preview mode full and a fairly low update period of 400ms.

It always seems to hang up just as an image is finishing, so I assume it is caused by some of the code which finishes up the image in some way? I am not at all familiar with any of this, but when I attached with VS Code is seemed to be hung up on the decode_first_stage call at around processing.py:655. Although this is literally the first time I've ever attached to a python program, so that might be meaningless.

EDIT: Just remembered, I also had the live preview sample step rate set to 1, which might increase the chance of this happening.

thot-experiment commented 1 year ago

FWIW i do not have --opt-sdp-no-mem-attention set explicitly, but perhaps it gets turned on implicitly by some other flag or configuration state? (i don't even see it listed in the docs)

Daemonrat commented 1 year ago

It also only seems to happens with torch 2.0.0+cu118 and --opt-sdp-no-mem-attention. My GPU is a 4070ti

I'm also on torch 2.0.0+cu118 and experiencing this issue on an RTX 2070 with --opt-sdp-attention . The issue persists with and without live previews. However, it seems to only happen sometimes. Some images process and upscale normally, while others finish processing, then hang when saving the image. After a couple minutes it processes, but this is not normal behavior. On commit a9fed7c3, this is not an issue for me.

chille9 commented 1 year ago

For me this happens too on rtx 2060. Also the general ui responds SUPER sluggish with a delay to many different actions.

Zullian commented 1 year ago

Same thing happening to me occasionally since updating to torch: 2.0.0+cu118, running with --opt-sdp-no-mem-attention, and without --xformers.

Stibo commented 1 year ago

Happens to me using a 2070 Super but on torch: 1.13.1+cu117. Happens more often when I do other stuff on my pc, especially Photoshop or Lightroom totally freezes SD. But other actions too...

ThereforeGames commented 1 year ago

I started seeing this problem after upgrading my environment from Python 3.9 to Python 3.10 (and rebuilding the venv.) Notably, Xformers went from 0.0.14 to 0.0.17 as part of this upgrade. Using Brave browser.

andypotato commented 1 year ago

This issue still exists with the latest 1.1.1 version. GPU usage will go to 100% and stay there.

Deleting venv and installing everything from scratch didn't help

EDIT: It only happens when using torch 2.0.0+cu118 - never seen it happening on torch 1.13.1+cu117

DeonHolo commented 1 year ago

EDIT: It only happens when using torch 2.0.0+cu118 - never seen it happening on torch 1.13.1+cu117

I'm pulling my hair out because of this, I am not tech savvy how do I downgrade torch 2.0 to torch 1.13.1?

Ziehn commented 1 year ago

EDIT: It only happens when using torch 2.0.0+cu118 - never seen it happening on torch 1.13.1+cu117

I'm pulling my hair out because of this, I am not tech savvy how do I downgrade torch 2.0 to torch 1.13.1?

Delete your venv folder and redownload is the easiest way

andypotato commented 1 year ago

Delete your venv folder and redownload is the easiest way

This can help with dependency issues after upgrades but does NOT solve this issue.

For me the problem went away after I changed the number of previews generated from "every 3 steps" to "every 5 steps". Haven't seen the issue again after that.

Ziehn commented 1 year ago

Delete your venv folder and redownload is the easiest way

This can help with dependency issues after upgrades but does NOT solve this issue.

For me the problem went away after I changed the number of previews generated from "every 3 steps" to "every 5 steps". Haven't seen the issue again after that.

I'm aware? This was in response to a user wanting to downgrade to torch 1.13.1

Only fix I've found for this hanging issue is to move on from AUTO1111. Vlad seems to be working much better for me

DeonHolo commented 1 year ago

For me the problem went away after I changed the number of previews generated from "every 3 steps" to "every 5 steps". Haven't seen the issue again after that.

Is this in the Live Preview setting where it says "Show new live preview image every N sampling steps. Set to -1 to show after completion of batch."? Mine was set to 10 on default it wasn't on 3.

EDIT: Still hangs even with this change. I'm just gonna wait for a while until an official fix is out. Please reply if there already is~

andypotato commented 1 year ago

Yes that's the setting I was talking about. I think the value that works for you depends on what kind of GPU you are using. I'm using a 3060 / 12GB and I can get away with a preview image every 5 sampling steps.

You could try disabling the live preview completely by setting it to -1 first. If this makes the issue go away then at least you have a workaround.

DeonHolo commented 1 year ago

I have a 3060ti. I am going to try setting it to -1.

kimraven11 commented 1 year ago

Changing live preview from 1 to 0 fixed the freezing issue for me. 2070 Super 8GB. Thanks!

DeonHolo commented 1 year ago

Tried changing live preview to -1, 0, and 5. Still have the problem hNHGNGNHNHN GNN

zer0mania commented 1 year ago

~~1.20 appears to fix it haven't had any issues after upgrading~~

nevermind

DeonHolo commented 1 year ago

Now I am not really sure. I had the same exact problem when I'm playing Diablo 4. GPU goes 100% utilization and everything becomes sluggish/slows down. Every action I do slows down including Google browser, the game, file explorer, etc. Maybe it's a GPU problem?

DeonHolo commented 1 year ago

Ok I am not sure but I removed the --no-half-vae argument in my commandline and it seems to fix it?

Raivshard commented 1 year ago

running a 2070 super here. I have tried many things, but none of them fixed the problem. Every time automatic upgrades, performance or stability seems to suffer. When I first installed it late last year, it ran okay, but then this issue cropped up and it's never been fixed. it takes aywhere from 5 - 30 seconds just to start a generation, and then spends 5 - 10 more at the end just doing nothing. Like, what the actual fuck?

michael-imbeault commented 1 year ago

+1. same here, no idea what causes it / how to fix it, started acting up maybe a week ago?

michael-imbeault commented 1 year ago

Here's a trace:

File "D:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 408, in run_predict output = await app.get_blocks().process_api( File "D:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1315, in process_api result = await self.call_function( File "D:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1043, in call_function prediction = await anyio.to_thread.run_sync( File "D:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "D:\StableDiffusion\stable-diffusion-webui\modules\ui_extra_networks.py", line 320, in save_preview index = int(index) ValueError: invalid literal for int() with base 10: 'task(l1jpx429bqjpety)'

unsupport commented 1 year ago

~~By adjusting the slider for "VRAM usage polls per second during generation," this issue may be resolved ???~~ Sorry, it doesn't seem to be working.

michael-imbeault commented 1 year ago

For me it is resolved changing the preference for image preview from 'full' to something else.

andypotato commented 1 year ago

Frequent fucking crashes changing models or generating images. Seems to me that show stopping bugs like this the fucking developers should be all over fixing it.

This thread contains many suggestions how to solve the problem. Instead of writing an entitled comment like this you could be constructive, test the suggestions and report back if they work for you or now.

mjranum commented 1 year ago

[I am posting this in multiple places, sorry, but there is a lot of discussion around this issue]

I have had a similar problem, and solved it. Apparently, permanently. Here's what I think is going on: the websockets layer between A1111 and SD is losing a message and hanging waiting for a response from the other side. It appears to be a result of when there is a lot of data going back and forth, possibly overrunning a queue someplace. If you think about it, A1111 and SD are shovelling big amounts of image data across the websockets. And here's how you exacerbate it: tell A1111 to display each image as its created, then set a "new image display time" down around 200ms. If you do that, it'll start failing pretty predictably, at random. How to fix: have it display the image every 30 iterations and set the display time at around 10 seconds. Poof. Problem gone. [This problem resembles a bug in Sun RPC from back around 1986; plus ca change...]

noahjgreer commented 1 year ago

I tried everything in this thread, still have it hanging at the very end. Using v1.6.0. Before, I was using a release from March of this year and it was working fine. Just recently re-pulled the repo to update it a few days ago. Wish I hadn't.

AUTOMATIC1111 / stable-diffusion-webui