AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.07k stars 26.68k forks source link

[Bug]: Generation just hangs for ever before last step #10110

Open Mozoloa opened 1 year ago

Mozoloa commented 1 year ago

Is there an existing issue for this?

What happened?

Since the update 1.1, very often when I do batches of images, one of them will hang at one of the latest steps and never complete.

Clicking interrupt does nothing, so does skip and reloading the UI doesn't help, the whole UI is stuck and it seems that no other functionality works. The console shows the total progress this way (I'm generating 100 batches of one 512x512 images ) :

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.44it/s]
Total progress:   3%|█▉                                                              | 60/2000 [00:11<04:26,  7.27it/s]

I can't do anything but start the whole thing

Steps to reproduce the problem

  1. Go to TXT2IMG or IMG2IMG
  2. Do a large batch of images
  3. At some point the generation will hang and nothing will work anymore

What should have happened?

The generation should have continued like it did before

Commit where the problem happens

c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd

What platforms do you use to access the UI ?

Windows 11, RTX3090

What browsers do you use to access the UI ?

Brave

Command Line Arguments

--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --xformers --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora

OR

--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --otp-sdp-attention --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora

List of extensions

ControlNet v1.1.134 Image browser

Console logs

venv "G:\AI\Image Gen\A1111\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd
Installing xformers
Collecting xformers==0.0.17
  Using cached xformers-0.0.17-cp310-cp310-win_amd64.whl (112.6 MB)
Installing collected packages: xformers
Successfully installed xformers-0.0.16
Installing requirements

Installing ImageReward requirement for image browser

Launching Web UI with arguments: --autolaunch --ckpt-dir G:\AI\Models\Stable-diffusion\Checkpoints --xformers --embeddings-dir G:\AI\Models\Stable-diffusion\Embeddings --lora-dir G:\AI\Models\Stable-diffusion\Lora --reinstall-xformers
ControlNet v1.1.134
ControlNet v1.1.134
Loading weights [3dcc66eccf] from G:\AI\Models\Stable-diffusion\Checkpoints\Men\Saruman.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(15): bad-artist, bad-artist-anime, bad-hands-5, bad-image-v2-39000, bad-picture-chill-75v, bad_prompt, bad_prompt_version2, badhandv4, charturnerv2, easynegative, HyperStylizeV6, ng_deepnegative_v1_75t, pureerosface_v1, ulzzang-6500, ulzzang-6500-v1.1
Textual inversion embeddings skipped(4): 21charturnerv2, nartfixer, nfixer, nrealfixer
Model loaded in 7.2s (load weights from disk: 2.5s, create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.5s, move model to device: 0.6s, load textual inversion embeddings: 2.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 19.8s (import torch: 2.7s, import gradio: 2.2s, import ldm: 1.0s, other imports: 2.4s, list SD models: 0.4s, setup codeformer: 0.1s, load scripts: 1.8s, load SD checkpoint: 7.2s, create ui: 1.2s, gradio launch: 0.7s).
Loading weights [c6bbc15e32] from G:\AI\Models\Stable-diffusion\Checkpoints\0\1.5-inpainting.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inpainting-inference.yaml
LatentInpaintDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.54 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Model loaded in 2.0s (create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.2s, move model to device: 0.6s).
Running DDIM Sampling with 19 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00,  9.21it/s]
Running DDIM Sampling with 19 timesteps                                              | 18/2000 [00:01<03:04, 10.77it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.87it/s]
Running DDIM Sampling with 19 timesteps                                              | 38/2000 [00:04<02:31, 12.94it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.92it/s]
Running DDIM Sampling with 19 timesteps                                              | 56/2000 [00:07<02:37, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.33it/s]
Running DDIM Sampling with 19 timesteps                                              | 76/2000 [00:10<02:29, 12.88it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.03it/s]
Running DDIM Sampling with 19 timesteps                                              | 94/2000 [00:13<03:02, 10.43it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.91it/s]
Running DDIM Sampling with 19 timesteps                                             | 113/2000 [00:15<02:33, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.84it/s]
Running DDIM Sampling with 19 timesteps                                             | 133/2000 [00:18<02:23, 13.03it/s]
Decoding image:  21%|██████████████                                                     | 4/19 [00:00<00:01, 11.32it/s]
Total progress:   7%|████▎                                                          | 137/2000 [00:21<04:56,  6.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.90it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.94it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.14it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.81it/s]
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]
Total progress:   5%|███▏                                                           | 101/2000 [00:23<07:14,  4.37it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.10it/s]
 75%|█████████████████████████████████████████████████████████████▌                    | 15/20 [00:02<00:00,  6.22it/s]
Total progress:   2%|█▏                                                              | 36/2000 [00:07<06:58,  4.69it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.17it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.89it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.07it/s]
 10%|████████▎                                                                          | 2/20 [00:00<00:03,  4.87it/s]
Total progress:   3%|██                                                              | 63/2000 [00:14<07:18,  4.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.57it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.44it/s]
Total progress:   3%|█▉                                                              | 60/2000 [00:11<04:26,  7.27it/s]

Additional information

I remember that at some point it hanged but got unstuck somehow and I got an error which I don't remember but it did say to use --no-half-vae, I haven't tested that and never needed that before on torch 1.13.1 for tens of thousands of gens. I'm exclusively using the new 840000 mse VAE

willpnelson commented 11 months ago

Since changing my settings to display every other step using TAESD I haven't had a freeze!

9-16-2023 This absolutely worked for me. Been having issues since reinstalling and i guess changing those settings, a few day ago. Single image generations randomly freezing, in the ui, the console, or both. Deforum animations randomly freezing. Changed this yesterday to 5 frame preview and Approx NN and havent had an issue since. This was the fix for me~

Just so we're clear, this is not a fix for the full preview, it's just using another preview engine, that's not full and it shows, and we already knew the other ones worked. It's not a solution

This isn't it either, at least for me. Watching the traffic go through when the interval is set to 1ms and 9999ms yields the same result - the last packet stalls due to python.exe hanging upon image completion. If packets throughout the image build stalled in the same fashion with the 1ms interval, that would support your theory.

dr3nn commented 6 months ago

This has become an issue for me recently. It happens every gen now. Renaming venv and changing lines 318 and 319 in \modules\launch_utils.py from:

torch_index_url = os.environ.get('TORCH_INDEX_URL', "https://download.pytorch.org/whl/cu121")
torch_command = os.environ.get('TORCH_COMMAND', f"pip install torch==2.1.2 torchvision==0.16.2 --extra-index-url {torch_index_url}")

to: torch_index_url = os.environ.get('TORCH_INDEX_URL', "https://download.pytorch.org/whl/cu117") torch_command = os.environ.get('TORCH_COMMAND', f"pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url {torch_index_url}") and if you use xformers, line 343 from: xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.23.post1') to xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')

fixed the hanging at the cost of a pretty big performance hit. Would like a better workaround since I go from around 2.7it/s to 1.4it/s on my card.

Update: I downloaded the fp16 SDXL VAE here https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl.vae.safetensors and removed --no-half-vae from my launch arguments and added --medvram. Problem is seemingly gone now so the above version change fix can be ignored if this works for you also.