AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.08k stars 26.68k forks source link

[Bug]: Generation just hangs for ever before last step #10110

Open Mozoloa opened 1 year ago

Mozoloa commented 1 year ago

Is there an existing issue for this?

What happened?

Since the update 1.1, very often when I do batches of images, one of them will hang at one of the latest steps and never complete.

Clicking interrupt does nothing, so does skip and reloading the UI doesn't help, the whole UI is stuck and it seems that no other functionality works. The console shows the total progress this way (I'm generating 100 batches of one 512x512 images ) :

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.44it/s]
Total progress:   3%|█▉                                                              | 60/2000 [00:11<04:26,  7.27it/s]

I can't do anything but start the whole thing

Steps to reproduce the problem

  1. Go to TXT2IMG or IMG2IMG
  2. Do a large batch of images
  3. At some point the generation will hang and nothing will work anymore

What should have happened?

The generation should have continued like it did before

Commit where the problem happens

c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd

What platforms do you use to access the UI ?

Windows 11, RTX3090

What browsers do you use to access the UI ?

Brave

Command Line Arguments

--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --xformers --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora

OR

--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --otp-sdp-attention --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora

List of extensions

ControlNet v1.1.134 Image browser

Console logs

venv "G:\AI\Image Gen\A1111\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd
Installing xformers
Collecting xformers==0.0.17
  Using cached xformers-0.0.17-cp310-cp310-win_amd64.whl (112.6 MB)
Installing collected packages: xformers
Successfully installed xformers-0.0.16
Installing requirements

Installing ImageReward requirement for image browser

Launching Web UI with arguments: --autolaunch --ckpt-dir G:\AI\Models\Stable-diffusion\Checkpoints --xformers --embeddings-dir G:\AI\Models\Stable-diffusion\Embeddings --lora-dir G:\AI\Models\Stable-diffusion\Lora --reinstall-xformers
ControlNet v1.1.134
ControlNet v1.1.134
Loading weights [3dcc66eccf] from G:\AI\Models\Stable-diffusion\Checkpoints\Men\Saruman.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(15): bad-artist, bad-artist-anime, bad-hands-5, bad-image-v2-39000, bad-picture-chill-75v, bad_prompt, bad_prompt_version2, badhandv4, charturnerv2, easynegative, HyperStylizeV6, ng_deepnegative_v1_75t, pureerosface_v1, ulzzang-6500, ulzzang-6500-v1.1
Textual inversion embeddings skipped(4): 21charturnerv2, nartfixer, nfixer, nrealfixer
Model loaded in 7.2s (load weights from disk: 2.5s, create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.5s, move model to device: 0.6s, load textual inversion embeddings: 2.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 19.8s (import torch: 2.7s, import gradio: 2.2s, import ldm: 1.0s, other imports: 2.4s, list SD models: 0.4s, setup codeformer: 0.1s, load scripts: 1.8s, load SD checkpoint: 7.2s, create ui: 1.2s, gradio launch: 0.7s).
Loading weights [c6bbc15e32] from G:\AI\Models\Stable-diffusion\Checkpoints\0\1.5-inpainting.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inpainting-inference.yaml
LatentInpaintDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.54 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Model loaded in 2.0s (create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.2s, move model to device: 0.6s).
Running DDIM Sampling with 19 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00,  9.21it/s]
Running DDIM Sampling with 19 timesteps                                              | 18/2000 [00:01<03:04, 10.77it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.87it/s]
Running DDIM Sampling with 19 timesteps                                              | 38/2000 [00:04<02:31, 12.94it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.92it/s]
Running DDIM Sampling with 19 timesteps                                              | 56/2000 [00:07<02:37, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.33it/s]
Running DDIM Sampling with 19 timesteps                                              | 76/2000 [00:10<02:29, 12.88it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.03it/s]
Running DDIM Sampling with 19 timesteps                                              | 94/2000 [00:13<03:02, 10.43it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.91it/s]
Running DDIM Sampling with 19 timesteps                                             | 113/2000 [00:15<02:33, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.84it/s]
Running DDIM Sampling with 19 timesteps                                             | 133/2000 [00:18<02:23, 13.03it/s]
Decoding image:  21%|██████████████                                                     | 4/19 [00:00<00:01, 11.32it/s]
Total progress:   7%|████▎                                                          | 137/2000 [00:21<04:56,  6.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.90it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.94it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.14it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.81it/s]
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]
Total progress:   5%|███▏                                                           | 101/2000 [00:23<07:14,  4.37it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.10it/s]
 75%|█████████████████████████████████████████████████████████████▌                    | 15/20 [00:02<00:00,  6.22it/s]
Total progress:   2%|█▏                                                              | 36/2000 [00:07<06:58,  4.69it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.17it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.89it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.07it/s]
 10%|████████▎                                                                          | 2/20 [00:00<00:03,  4.87it/s]
Total progress:   3%|██                                                              | 63/2000 [00:14<07:18,  4.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.57it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.44it/s]
Total progress:   3%|█▉                                                              | 60/2000 [00:11<04:26,  7.27it/s]

Additional information

I remember that at some point it hanged but got unstuck somehow and I got an error which I don't remember but it did say to use --no-half-vae, I haven't tested that and never needed that before on torch 1.13.1 for tens of thousands of gens. I'm exclusively using the new 840000 mse VAE

Mozoloa commented 1 year ago

New information: I've tried --no-half-vae and it doesn't change anything. Also the hanging seems to also happen when I try to interrupt some gens, still no information in the console

begon123 commented 1 year ago

It began to arise more and more often, rebooting sdiffused and chrome no longer helps. Gives 1-2 generations and again an error.

Please help! 2023-05-05_12-18-42 p.s windows 11, rtx 3060 (last drivers)

Sorry, I have vladmandic/automatic - the bug report is not for you, but the error is exactly the same and I have not found a similar one anywhere.

VRArt1 commented 1 year ago

I've also been having this issue since one of the recent updates.

Chem1ce commented 1 year ago

Im also having the same issue.

Mozoloa commented 1 year ago

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

ChenNdG commented 1 year ago

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?

Mozoloa commented 1 year ago

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?

The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv

ChenNdG commented 1 year ago

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?

The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv

if I understand correctly, I have to change this line here : image

Mozoloa commented 1 year ago

Found the commit that changed it : https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/d5063e07e8b4737621978feffd37b18077b9ea64 just revert the change from launch.py

ChenNdG commented 1 year ago

Found the commit that changed it : d5063e0 just revert the change from launch.py

Thanks !

halr9000 commented 1 year ago

Having same issue, can't reliably reproduce, it just happens when it happens, and there's no hints in the console for troubleshooting.

quasiblob commented 1 year ago

I get this too, I have checked out a commit from around this date, can't remember which one as today I noticed I had for some reason changed to latest commit, so I had to do a checkout again, this one doesn't seem to be stopping when batch generating images, at least it hasn't so far:

image

marcsyp commented 1 year ago

I also have this problem and reverting to the master branch deployment at (22bcc7be428c94e9408f589966c2040187245d81) does indeed solve the problems -- but of course this is far less than ideal as a solution, as there has been a lot of development in the last 5 weeks and we are out in the wind...

Mozoloa commented 1 year ago

For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)

⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly

nickr61 commented 1 year ago

same issue after updating to torch 2, it seems to hang on simple prompts for me, more complex ones run on in generate forever ok but if i use a very few words it hangs after a few image generations and I have to close the cmd window and restart with webui-user.bat, just reloading the web ui doesn't work. I also upgraded pip but it still does it on occasion . Never had the problem occur before torch 2 upgrade

poisenbery commented 1 year ago

Also getting this issue.

oliverban commented 1 year ago

Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?

Mozoloa commented 1 year ago

Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?

I'm not sure I understand what you're saying

ostap667inbox commented 1 year ago

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

Mozoloa commented 1 year ago

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now

NathanBonnet30 commented 1 year ago

I'm also sticking to torch 1, I get even slightly better performance on it.

poisenbery commented 1 year ago

I just deleted the entire venv.

I also included --skip-version-check because it shows a message saying "This was tested to work with Torch 2.0" which is obviously a lie.

Mozoloa commented 1 year ago

Has anyone tried with 1.2.0 yet ? wondering if this still does it but i'm on torch 1

ecker00 commented 1 year ago

Still happens on 1.2.0 for me, had to revert to old torch like described above.

NathanBonnet30 commented 1 year ago

I've just discovered something very weird, I have often this hanging bug when I Hires. fix, I've just checked out my task manager and Discord took me like 50% of my GPU when I'm on SD, quitting Discord fixed this bug. Why the fuck Discord took me that much, is it only me ?

Kadah commented 1 year ago

xformers being enabled or not has no affect on this.

This hang has been rather random. Most of the time it will happen within 3-5 gens from launch, but sometimes it goes for many dozens while other times it will happen on the first. Prompt+seed doesn't matter, run the same settings each time will have chances of triggering it.

XYZ of more than a few is highly risky.

elchupacabrinski commented 1 year ago

I just cant get this fixed somehow. Time to check Vlad again smh

Zuckonit commented 1 year ago

meet with same problem

UtopiaEditorial commented 1 year ago

pip install image-reward. )

Mozoloa commented 1 year ago

Same problem after full reset of the UI

cyberofficial commented 1 year ago

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now

This worked for me, but decided to test things out

opera_g0TnMZRh7Z

These settings work for me and doesn't hang.

Kadah commented 1 year ago

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now

This worked for me, but decided to test things out

opera_g0TnMZRh7Z

These settings work for me and doesn't hang.

Did you change Progress/preview update period? The rest of those settings mirror what I use except I have sampling steps set to 5 and update period at the default(?) of 1000.

cyberofficial commented 1 year ago

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now

This worked for me, but decided to test things out

opera_g0TnMZRh7Z

These settings work for me and doesn't hang.

Did you change Progress/preview update period? The rest of those settings mirror what I use except I have sampling steps set to 5 and update period at the default(?) of 1000.

Yea, it was set to 1ms for me, i changed to 500ms and seemed to fix it for me. made about 30 generations of random images. usually the issue happened like after around 7 generations

Kadah commented 1 year ago

Hmm. I wonder if this might be cased by something like a race condition between the last preview not finishing/displaying before the generation cycle has completed.

morecatplease commented 1 year ago

It looks like adjusting the live preview settings fixed it for me, too. Set it to 5 images and 1000 ms Version: v1.3.0  •  python: 3.10.9  •  torch: 2.0.1+cu118  •  xformers: 0.0.17  •  gradio: 3.31.0

Gyramuur commented 1 year ago

For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)

* Rename the _venv_ folder inside the _stable-diffusion-webui_ folder to _venvTorch2_ or something

* Modify Launch.py by replacing the following lines (by what comes after the ":") _**check the warning bellow if you can't find them**_
  225 : `torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")`
  228 : `xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')`

⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly

* Then add `--reinstall-torch` and `--reinstall-xformers` (if you use the latter) in the _webui-user.bat_ file in the _stable-diffusion-webui_ folder next to `set COMMANDLINE_ARGS=` or add it to the additional arguments if you use my easy launcher and save.

* Relaunch the UI via _Webui-user.bat_ or my launcher

* This will create a new _venv_ folder with the old torch versions that still work perfectly well

* Now if you ever want to go back to torch 2.0 when it's fixed, just rename the new _venv_ folder to _venvTorch1_ and rename _venvTorch2_ to _venv_

* You can switch back to torch 1 by doing it the other way around ofc

I want to say that this worked, but the launch.py inside of my stable-diffusion-webui folder only has 39 lines, so I'm not sure what to do, lol. This is what it shows:

from modules import launch_utils

args = launch_utils.args
python = launch_utils.python
git = launch_utils.git
index_url = launch_utils.index_url
dir_repos = launch_utils.dir_repos

commit_hash = launch_utils.commit_hash
git_tag = launch_utils.git_tag

run = launch_utils.run
is_installed = launch_utils.is_installed
repo_dir = launch_utils.repo_dir

run_pip = launch_utils.run_pip
check_run_python = launch_utils.check_run_python
git_clone = launch_utils.git_clone
git_pull_recursive = launch_utils.git_pull_recursive
run_extension_installer = launch_utils.run_extension_installer
prepare_environment = launch_utils.prepare_environment
configure_for_tests = launch_utils.configure_for_tests
start = launch_utils.start

def main():
    if not args.skip_prepare_environment:
        prepare_environment()

    if args.test_server:
        configure_for_tests()

    start()

if __name__ == "__main__":
    main()
Mozoloa commented 1 year ago

Yeah code is changing rapidly I can't keep up lmao

Gwynei commented 1 year ago

It looks like adjusting the live preview settings fixed it for me, too. Set it to 5 images and 1000 ms Version: v1.3.0  •  python: 3.10.9  •  torch: 2.0.1+cu118  •  xformers: 0.0.17  •  gradio: 3.31.0

Just wanted to confirm changing the live preview settings solved this issue for me as well.

Mozoloa commented 1 year ago

Just so we're clear, your are all talking about workarounds not fixes, a fix is about making sure something that broke after an update isn't broken anymore. Reducing live preview frequency is inconvenient, an option shouldn't be there if it just bricks the UI after 3 gens

halr9000 commented 1 year ago

an option shouldn't be there if it just bricks the UI after 3 gens

Well, one fix is to "prevent fast previews", ie remove a feature. Another would be to fix what seems like an upstream regression that broke fast previews. Or did I hear you backwards?

Mozoloa commented 1 year ago

Ideally it should still be there but work lmao

lenkunz commented 1 year ago

For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)

* Rename the _venv_ folder inside the _stable-diffusion-webui_ folder to _venvTorch2_ or something

* Modify Launch.py by replacing the following lines (by what comes after the ":") _**check the warning bellow if you can't find them**_

  225 : `torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")`

  228 : `xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')`

⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly

* Then add `--reinstall-torch` and `--reinstall-xformers` (if you use the latter) in the _webui-user.bat_ file in the _stable-diffusion-webui_ folder next to `set COMMANDLINE_ARGS=` or add it to the additional arguments if you use my easy launcher and save.

* Relaunch the UI via _Webui-user.bat_ or my launcher

* This will create a new _venv_ folder with the old torch versions that still work perfectly well

* Now if you ever want to go back to torch 2.0 when it's fixed, just rename the new _venv_ folder to _venvTorch1_ and rename _venvTorch2_ to _venv_

* You can switch back to torch 1 by doing it the other way around ofc

I want to say that this worked, but the launch.py inside of my stable-diffusion-webui folder only has 39 lines, so I'm not sure what to do, lol. This is what it shows:


from modules import launch_utils

args = launch_utils.args

python = launch_utils.python

git = launch_utils.git

index_url = launch_utils.index_url

dir_repos = launch_utils.dir_repos

commit_hash = launch_utils.commit_hash

git_tag = launch_utils.git_tag

run = launch_utils.run

is_installed = launch_utils.is_installed

repo_dir = launch_utils.repo_dir

run_pip = launch_utils.run_pip

check_run_python = launch_utils.check_run_python

git_clone = launch_utils.git_clone

git_pull_recursive = launch_utils.git_pull_recursive

run_extension_installer = launch_utils.run_extension_installer

prepare_environment = launch_utils.prepare_environment

configure_for_tests = launch_utils.configure_for_tests

start = launch_utils.start

def main():

    if not args.skip_prepare_environment:

        prepare_environment()

    if args.test_server:

        configure_for_tests()

    start()

if __name__ == "__main__":

    main()

This can also be set by set environment variable of TORCH_COMMAND and XFORMERS_PACKAGE to desired value in bat file.

oliverban commented 1 year ago

Since changing my settings to display every other step using TAESD I haven't had a freeze!

Mozoloa commented 1 year ago

This still happens in 1.4.0, using Full Preview, every 2 samples, refresh period 500ms, this is very annoying

Mozoloa commented 1 year ago

I've now tested with several settings and the generations seem to hang midway then resume, then interrupting them is freezing the UI for 10s, this is very frustrating

pookienumnums commented 1 year ago

Since changing my settings to display every other step using TAESD I haven't had a freeze!

9-16-2023

This absolutely worked for me. Been having issues since reinstalling and i guess changing those settings, a few day ago. Single image generations randomly freezing, in the ui, the console, or both. Deforum animations randomly freezing.

Changed this yesterday to 5 frame preview and Approx NN and havent had an issue since.

This was the fix for me~

Mozoloa commented 1 year ago

Since changing my settings to display every other step using TAESD I haven't had a freeze!

9-16-2023

This absolutely worked for me. Been having issues since reinstalling and i guess changing those settings, a few day ago. Single image generations randomly freezing, in the ui, the console, or both. Deforum animations randomly freezing.

Changed this yesterday to 5 frame preview and Approx NN and havent had an issue since.

This was the fix for me~

Just so we're clear, this is not a fix for the full preview, it's just using another preview engine, that's not full and it shows, and we already knew the other ones worked. It's not a solution

mjranum commented 1 year ago

[I am posting this in multiple places; it seems to be a common issue] I have had a similar problem, and solved it. Apparently, permanently. Here's what I think is going on: the websockets layer between A1111 and SD is losing a message and hanging waiting for a response from the other side. It appears to be a result of when there is a lot of data going back and forth, possibly overrunning a queue someplace. If you think about it, A1111 and SD are shovelling big amounts of image data across the websockets. And here's how you exacerbate it: tell A1111 to display each image as its created, then set a "new image display time" down around 200ms. If you do that, it'll start failing pretty predictably, at random. How to fix: have it display the image every 30 iterations and set the display time at around 10 seconds. Poof. Problem gone. [This problem resembles a bug in Sun RPC from back around 1986; plus ca change...]

Mozoloa commented 1 year ago

[I am posting this in multiple places; it seems to be a common issue] I have had a similar problem, and solved it. Apparently, permanently. Here's what I think is going on: the websockets layer between A1111 and SD is losing a message and hanging waiting for a response from the other side. It appears to be a result of when there is a lot of data going back and forth, possibly overrunning a queue someplace. If you think about it, A1111 and SD are shovelling big amounts of image data across the websockets. And here's how you exacerbate it: tell A1111 to display each image as its created, then set a "new image display time" down around 200ms. If you do that, it'll start failing pretty predictably, at random. How to fix: have it display the image every 30 iterations and set the display time at around 10 seconds. Poof. Problem gone. [This problem resembles a bug in Sun RPC from back around 1986; plus ca change...]

Again, not a fix, the problem is not gone, you just used different settings making the preview refresh more than 10 times slower which defeats the point. By the time 10 seconds have passed, 4 images have been generated on a 4090. I mostly use 20samples, this would equate to not using preview at all for quick gpus. Fast full preview used to work with torch 1 for the longest time, switching to 2 brought this problem. Still not fixed afaik

mjranum commented 1 year ago

That's exactly my point: if the problem is a race condition, then a fast card that does not have contention is not likely to trigger a problem in the websockets queueing layer.

I made the image refresh slower on my slower card/system and it fixed the problem. It also does not happen on my system that's running a 25gb 4090. Also, the image refresh slowdown doesn't matter at all on that system because it kicks images out pretty darn quick.

I'm offering a hypothetical diagnosis on what I believe to be a race condition/synchronization problem. Saying "you slowed it down" is ... dramatically missing the point.