Open Mozoloa opened 1 year ago
New information: I've tried --no-half-vae and it doesn't change anything. Also the hanging seems to also happen when I try to interrupt some gens, still no information in the console
It began to arise more and more often, rebooting sdiffused and chrome no longer helps. Gives 1-2 generations and again an error.
Please help! p.s windows 11, rtx 3060 (last drivers)
Sorry, I have vladmandic/automatic - the bug report is not for you, but the error is exactly the same and I have not found a similar one anywhere.
I've also been having this issue since one of the recent updates.
Im also having the same issue.
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?
The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?
The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv
if I understand correctly, I have to change this line here :
Found the commit that changed it : https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/d5063e07e8b4737621978feffd37b18077b9ea64 just revert the change from launch.py
Found the commit that changed it : d5063e0 just revert the change from launch.py
Thanks !
Having same issue, can't reliably reproduce, it just happens when it happens, and there's no hints in the console for troubleshooting.
I get this too, I have checked out a commit from around this date, can't remember which one as today I noticed I had for some reason changed to latest commit, so I had to do a checkout again, this one doesn't seem to be stopping when batch generating images, at least it hasn't so far:
I also have this problem and reverting to the master branch deployment at (22bcc7be428c94e9408f589966c2040187245d81) does indeed solve the problems -- but of course this is far less than ideal as a solution, as there has been a lot of development in the last 5 weeks and we are out in the wind...
For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)
torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")
228 : xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')
⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly
--reinstall-torch
and --reinstall-xformers
(if you use the latter) in the webui-user.bat file in the stable-diffusion-webui folder next to set COMMANDLINE_ARGS=
or add it to the additional arguments if you use my easy launcher and save.same issue after updating to torch 2, it seems to hang on simple prompts for me, more complex ones run on in generate forever ok but if i use a very few words it hangs after a few image generations and I have to close the cmd window and restart with webui-user.bat, just reloading the web ui doesn't work. I also upgraded pip but it still does it on occasion . Never had the problem occur before torch 2 upgrade
Also getting this issue.
Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?
Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?
I'm not sure I understand what you're saying
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now
I'm also sticking to torch 1, I get even slightly better performance on it.
I just deleted the entire venv.
I also included --skip-version-check because it shows a message saying "This was tested to work with Torch 2.0" which is obviously a lie.
Has anyone tried with 1.2.0 yet ? wondering if this still does it but i'm on torch 1
Still happens on 1.2.0 for me, had to revert to old torch like described above.
I've just discovered something very weird, I have often this hanging bug when I Hires. fix, I've just checked out my task manager and Discord took me like 50% of my GPU when I'm on SD, quitting Discord fixed this bug. Why the fuck Discord took me that much, is it only me ?
xformers being enabled or not has no affect on this.
This hang has been rather random. Most of the time it will happen within 3-5 gens from launch, but sometimes it goes for many dozens while other times it will happen on the first. Prompt+seed doesn't matter, run the same settings each time will have chances of triggering it.
XYZ of more than a few is highly risky.
I just cant get this fixed somehow. Time to check Vlad again smh
meet with same problem
pip install image-reward. )
Same problem after full reset of the UI
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now
This worked for me, but decided to test things out
These settings work for me and doesn't hang.
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now
This worked for me, but decided to test things out
These settings work for me and doesn't hang.
Did you change Progress/preview update period
? The rest of those settings mirror what I use except I have sampling steps set to 5 and update period at the default(?) of 1000.
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now
This worked for me, but decided to test things out
These settings work for me and doesn't hang.
Did you change
Progress/preview update period
? The rest of those settings mirror what I use except I have sampling steps set to 5 and update period at the default(?) of 1000.
Yea, it was set to 1ms for me, i changed to 500ms and seemed to fix it for me. made about 30 generations of random images. usually the issue happened like after around 7 generations
Hmm. I wonder if this might be cased by something like a race condition between the last preview not finishing/displaying before the generation cycle has completed.
It looks like adjusting the live preview settings fixed it for me, too. Set it to 5 images and 1000 ms Version: v1.3.0 • python: 3.10.9 • torch: 2.0.1+cu118 • xformers: 0.0.17 • gradio: 3.31.0
For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)
* Rename the _venv_ folder inside the _stable-diffusion-webui_ folder to _venvTorch2_ or something * Modify Launch.py by replacing the following lines (by what comes after the ":") _**check the warning bellow if you can't find them**_ 225 : `torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")` 228 : `xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')`
⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly
* Then add `--reinstall-torch` and `--reinstall-xformers` (if you use the latter) in the _webui-user.bat_ file in the _stable-diffusion-webui_ folder next to `set COMMANDLINE_ARGS=` or add it to the additional arguments if you use my easy launcher and save. * Relaunch the UI via _Webui-user.bat_ or my launcher * This will create a new _venv_ folder with the old torch versions that still work perfectly well * Now if you ever want to go back to torch 2.0 when it's fixed, just rename the new _venv_ folder to _venvTorch1_ and rename _venvTorch2_ to _venv_ * You can switch back to torch 1 by doing it the other way around ofc
I want to say that this worked, but the launch.py inside of my stable-diffusion-webui folder only has 39 lines, so I'm not sure what to do, lol. This is what it shows:
from modules import launch_utils
args = launch_utils.args
python = launch_utils.python
git = launch_utils.git
index_url = launch_utils.index_url
dir_repos = launch_utils.dir_repos
commit_hash = launch_utils.commit_hash
git_tag = launch_utils.git_tag
run = launch_utils.run
is_installed = launch_utils.is_installed
repo_dir = launch_utils.repo_dir
run_pip = launch_utils.run_pip
check_run_python = launch_utils.check_run_python
git_clone = launch_utils.git_clone
git_pull_recursive = launch_utils.git_pull_recursive
run_extension_installer = launch_utils.run_extension_installer
prepare_environment = launch_utils.prepare_environment
configure_for_tests = launch_utils.configure_for_tests
start = launch_utils.start
def main():
if not args.skip_prepare_environment:
prepare_environment()
if args.test_server:
configure_for_tests()
start()
if __name__ == "__main__":
main()
Yeah code is changing rapidly I can't keep up lmao
It looks like adjusting the live preview settings fixed it for me, too. Set it to 5 images and 1000 ms Version: v1.3.0 • python: 3.10.9 • torch: 2.0.1+cu118 • xformers: 0.0.17 • gradio: 3.31.0
Just wanted to confirm changing the live preview settings solved this issue for me as well.
Just so we're clear, your are all talking about workarounds not fixes, a fix is about making sure something that broke after an update isn't broken anymore. Reducing live preview frequency is inconvenient, an option shouldn't be there if it just bricks the UI after 3 gens
an option shouldn't be there if it just bricks the UI after 3 gens
Well, one fix is to "prevent fast previews", ie remove a feature. Another would be to fix what seems like an upstream regression that broke fast previews. Or did I hear you backwards?
Ideally it should still be there but work lmao
For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)
* Rename the _venv_ folder inside the _stable-diffusion-webui_ folder to _venvTorch2_ or something * Modify Launch.py by replacing the following lines (by what comes after the ":") _**check the warning bellow if you can't find them**_ 225 : `torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")` 228 : `xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')`
⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly
* Then add `--reinstall-torch` and `--reinstall-xformers` (if you use the latter) in the _webui-user.bat_ file in the _stable-diffusion-webui_ folder next to `set COMMANDLINE_ARGS=` or add it to the additional arguments if you use my easy launcher and save. * Relaunch the UI via _Webui-user.bat_ or my launcher * This will create a new _venv_ folder with the old torch versions that still work perfectly well * Now if you ever want to go back to torch 2.0 when it's fixed, just rename the new _venv_ folder to _venvTorch1_ and rename _venvTorch2_ to _venv_ * You can switch back to torch 1 by doing it the other way around ofc
I want to say that this worked, but the launch.py inside of my stable-diffusion-webui folder only has 39 lines, so I'm not sure what to do, lol. This is what it shows:
from modules import launch_utils args = launch_utils.args python = launch_utils.python git = launch_utils.git index_url = launch_utils.index_url dir_repos = launch_utils.dir_repos commit_hash = launch_utils.commit_hash git_tag = launch_utils.git_tag run = launch_utils.run is_installed = launch_utils.is_installed repo_dir = launch_utils.repo_dir run_pip = launch_utils.run_pip check_run_python = launch_utils.check_run_python git_clone = launch_utils.git_clone git_pull_recursive = launch_utils.git_pull_recursive run_extension_installer = launch_utils.run_extension_installer prepare_environment = launch_utils.prepare_environment configure_for_tests = launch_utils.configure_for_tests start = launch_utils.start def main(): if not args.skip_prepare_environment: prepare_environment() if args.test_server: configure_for_tests() start() if __name__ == "__main__": main()
This can also be set by set environment variable of TORCH_COMMAND and XFORMERS_PACKAGE to desired value in bat file.
Since changing my settings to display every other step using TAESD I haven't had a freeze!
This still happens in 1.4.0, using Full Preview, every 2 samples, refresh period 500ms, this is very annoying
I've now tested with several settings and the generations seem to hang midway then resume, then interrupting them is freezing the UI for 10s, this is very frustrating
Since changing my settings to display every other step using TAESD I haven't had a freeze!
9-16-2023
This absolutely worked for me. Been having issues since reinstalling and i guess changing those settings, a few day ago. Single image generations randomly freezing, in the ui, the console, or both. Deforum animations randomly freezing.
Changed this yesterday to 5 frame preview and Approx NN and havent had an issue since.
This was the fix for me~
Since changing my settings to display every other step using TAESD I haven't had a freeze!
9-16-2023
This absolutely worked for me. Been having issues since reinstalling and i guess changing those settings, a few day ago. Single image generations randomly freezing, in the ui, the console, or both. Deforum animations randomly freezing.
Changed this yesterday to 5 frame preview and Approx NN and havent had an issue since.
This was the fix for me~
Just so we're clear, this is not a fix for the full preview, it's just using another preview engine, that's not full and it shows, and we already knew the other ones worked. It's not a solution
[I am posting this in multiple places; it seems to be a common issue] I have had a similar problem, and solved it. Apparently, permanently. Here's what I think is going on: the websockets layer between A1111 and SD is losing a message and hanging waiting for a response from the other side. It appears to be a result of when there is a lot of data going back and forth, possibly overrunning a queue someplace. If you think about it, A1111 and SD are shovelling big amounts of image data across the websockets. And here's how you exacerbate it: tell A1111 to display each image as its created, then set a "new image display time" down around 200ms. If you do that, it'll start failing pretty predictably, at random. How to fix: have it display the image every 30 iterations and set the display time at around 10 seconds. Poof. Problem gone. [This problem resembles a bug in Sun RPC from back around 1986; plus ca change...]
[I am posting this in multiple places; it seems to be a common issue] I have had a similar problem, and solved it. Apparently, permanently. Here's what I think is going on: the websockets layer between A1111 and SD is losing a message and hanging waiting for a response from the other side. It appears to be a result of when there is a lot of data going back and forth, possibly overrunning a queue someplace. If you think about it, A1111 and SD are shovelling big amounts of image data across the websockets. And here's how you exacerbate it: tell A1111 to display each image as its created, then set a "new image display time" down around 200ms. If you do that, it'll start failing pretty predictably, at random. How to fix: have it display the image every 30 iterations and set the display time at around 10 seconds. Poof. Problem gone. [This problem resembles a bug in Sun RPC from back around 1986; plus ca change...]
Again, not a fix, the problem is not gone, you just used different settings making the preview refresh more than 10 times slower which defeats the point. By the time 10 seconds have passed, 4 images have been generated on a 4090. I mostly use 20samples, this would equate to not using preview at all for quick gpus. Fast full preview used to work with torch 1 for the longest time, switching to 2 brought this problem. Still not fixed afaik
That's exactly my point: if the problem is a race condition, then a fast card that does not have contention is not likely to trigger a problem in the websockets queueing layer.
I made the image refresh slower on my slower card/system and it fixed the problem. It also does not happen on my system that's running a 25gb 4090. Also, the image refresh slowdown doesn't matter at all on that system because it kicks images out pretty darn quick.
I'm offering a hypothetical diagnosis on what I believe to be a race condition/synchronization problem. Saying "you slowed it down" is ... dramatically missing the point.
Is there an existing issue for this?
What happened?
Since the update 1.1, very often when I do batches of images, one of them will hang at one of the latest steps and never complete.
Clicking interrupt does nothing, so does skip and reloading the UI doesn't help, the whole UI is stuck and it seems that no other functionality works. The console shows the total progress this way (I'm generating 100 batches of one 512x512 images ) :
I can't do anything but start the whole thing
Steps to reproduce the problem
What should have happened?
The generation should have continued like it did before
Commit where the problem happens
c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd
What platforms do you use to access the UI ?
Windows 11, RTX3090
What browsers do you use to access the UI ?
Brave
Command Line Arguments
List of extensions
ControlNet v1.1.134 Image browser
Console logs
Additional information
I remember that at some point it hanged but got unstuck somehow and I got an error which I don't remember but it did say to use --no-half-vae, I haven't tested that and never needed that before on torch 1.13.1 for tens of thousands of gens. I'm exclusively using the new 840000 mse VAE