brkirch / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
195 stars 10 forks source link

[Bug]: NansException #3

Closed hoonlight closed 1 year ago

hoonlight commented 1 year ago

Is there an existing issue for this?

What happened?

While creating the image, the process aborted with the following error.

100%|███████████████████████████████████████████| 40/40 [01:00<00:00,  1.52s/it]
SwinIR tiles: 100%|█████████████████████████████| 15/15 [00:51<00:00,  3.45s/it]
100%|███████████████████████████████████████████| 40/40 [10:23<00:00, 15.59s/it]
 20%|████████▊                                   | 8/40 [00:14<00:56,  1.77s/it]
Error completing request█

Also, this error doesn't seem to happen every time, it seems to happen intermittently. So I'm still watching.

Steps to reproduce the problem

M1pro 14 macos 13.3 chilloutmix Fp16 model (pruned, non-ema)

512x768 SDE++ 2M Karras CFG 7 40 steps hires.fix 2x, 40 steps, swinIR_4x

What should have happened?

The image should have been created normally through the hires.fix process.

Commit where the problem happens

4b15929

What platforms do you use to access the UI ?

MacOS

What browsers do you use to access the UI ?

MS Edge

Command Line Arguments

No

List of extensions

Dynamic prompts

Console logs

Traceback (most recent call last):
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/processing.py", line 636, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/processing.py", line 836, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 351, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 227, in launch_sampling
    return func()
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 351, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/Users/hoon/Documents/stable-diffusion-webui/python/3.10.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/hoon/Documents/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/Users/hoon/Documents/stable-diffusion-webui/python/3.10.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 145, in forward
    devices.test_for_nans(x_out, "unet")
  File "/Users/hoon/Documents/stable-diffusion-webui/modules/devices.py", line 152, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Additional information

I've never had anything like this happen before(with automatic1111 version)

jrittvo commented 1 year ago

I was just about to open an issue with pretty much the same info as above:

After updating to MacOS 13.3, when using version 1.5 models, there are very infrequent failures mid-generation, (~ 1 in 20, and clustered, not evenly distributed), throwing this error:

File "/Users/jrittvo/git/Automatic1111-WebUI/modules/devices.py", line 152, in test_for_nans raise NansException(message) modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Running the generation again right after, with all settings left the same, works fine as often as not.

Enabling "Upcast cross attention layer to float32" seems to prevent the error from happening completely, even though these are 1.5 models.

I believe that all my 1.5 version models are ema-only fp16.

Version 2.1 models work as expected with "Upcast cross attention layer to float32" enabled, and fail to generate from the start by erroring out with it disabled, again as expected.

I think the error being reported is itself in error, and that it is interrupting what would otherwise be a successful generation. I will test later today with using the "--disable-nan-check" argument. I assume there is no downside to enabling it.

hoonlight commented 1 year ago

I think the error being reported is itself in error, and that it is interrupting what would otherwise be a successful generation. I will test later today with using the "--disable-nan-check" argument. I assume there is no downside to enabling it.

It seems that you are right. The tasks are proceeding without interruption after adding --disable-nan-check. Thank you.

MDMAchine commented 1 year ago

I want to add that I also had this issue. Even on my other version I had set up, which was pretty similar to this one.

Few events to occur:

I had better success avoiding most errors with --no-half-vae and --disable-nan-check

brkirch commented 1 year ago

@comienzo2093 @jrittvo @MDMAchine Please let me know if 20230416_experimental still gives you NansException with Upcast cross attention layer to float32 set to Automatic.

jrittvo commented 1 year ago

No NansException with Automatic (or with Enabled). With Disabled it errors, as expected. Tested with tomesd enabled for both the initial and Hires fix passes, Restore faces also enabled, and including a negative prompt with Disable negative guidance minimum sigma when token merging is active enabled.

I'm only just starting to play with this build. I spent a bit of time not getting anywhere in the other thread you solved. Thank you for that!

On this new build, a run of 4 generations with Hires fix that took 285 seconds in the previous build (and in my souped up regular Auto1111) is down to 240 seconds. That's 15% faster. Pretty impressive! I'll take a look at memory load tomorrow. No bugs at all, so far. You fixed a nasty one from the last build that crashed the app when I tried to make any selection in Maximum down sampling on the Token Merge page.

jrittvo commented 1 year ago

Spoke to soon on "You fixed a nasty one from the last build that crashed the app when I tried to make any selection in Maximum down sampling on the Token Merge page." It still errors out when I hit apply settings. Looks like an easy one to fix. AssertionError: Bad value for setting token_merging_maximum_down_sampling: 1; expecting int.

jrittvo commented 1 year ago

I can generate a 1536x1536 with this build on my MacBook M1Pro w/ 16GB. I bet I can do 2048x2048 if I close all my other apps and Firefox tabs. Gonna have to try the same with the official build. I've never tried anything above 1024x1024 until just now.

brkirch commented 1 year ago

Spoke to soon on "You fixed a nasty one from the last build that crashed the app when I tried to make any selection in Maximum down sampling on the Token Merge page." It still errors out when I hit apply settings. Looks like an easy one to fix. AssertionError: Bad value for setting token_merging_maximum_down_sampling: 1; expecting int.

@jrittvo Fixed in 20230416_experimental.

I can generate a 1536x1536 with this build on my MacBook M1Pro w/ 16GB. I bet I can do 2048x2048 if I close all my other apps and Firefox tabs. Gonna have to try the same with the official build. I've never tried anything above 1024x1024 until just now.

You will probably find that 2048x2048 doesn't work, but any resolution below that should work (sometimes I generate at 2000x2000).

hoonlight commented 1 year ago

@comienzo2093 @jrittvo @MDMAchine Please let me know if 20230416_experimental still gives you NansException with Upcast cross attention layer to float32 set to Automatic.

After testing for several hours, now I don't get that error. great!

hoonlight commented 1 year ago

@comienzo2093 @jrittvo @MDMAchine Please let me know if 20230416_experimental still gives you NansException with Upcast cross attention layer to float32 set to Automatic.

RuntimeError: MPS backend out of memory (MPS allocated: 5.63 GB, other allocations: 13.12 GB, max allowed: 18.13 GB). Tried to allocate 768.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Another problem is occurring. I'll open an issue after watching a bit more on this.

jrittvo commented 1 year ago

What size image or batch is that with? I hit that error @ 2048x2048. 1536x1536 was slow, but it did complete. Doing multiple images in a batch, as opposed to multiple batches of 1 image each, also taxes memory.

hoonlight commented 1 year ago

What size image or batch is that with? I hit that error @ 2048x2048. 1536x1536 was slow, but it did complete. Doing multiple images in a batch, as opposed to multiple batches of 1 image each, also taxes memory.

mbp14, fp16 checkpoint, Upcast cross attention layer to float32 : automatic DPM++SDE karras, 512x768, hires.fix, Latent, upscale by x2 batch size : 1

thank you I'll keep testing by changing the options.

jrittvo commented 1 year ago

Have you tried exporting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 as suggested in the error message? Could be a fix, or could cause a big crash. Also, for me in general, not yet with this version, there seem to be times when something "gets stuck" and generation slows down a lot. I assume its because of memory needing to swap. Shutting down Automatic1111 completely and restarting returns things to normal for me when that happens. It's not often. Once every few days.

jrittvo commented 1 year ago

One more thing. I've been using a 2.6 GB version of the SD-2.1-768 model (and the .yaml that pairs with it). That is half the size of the default version. I don't know if a smaller model means lower memory pressure, but it might be worth a try. It is from one of the groups that is linked to for getting the set of ControlNet models, and it is a safetensor, so I trust it. It produces the exact same results as the 5.2 GB models. https://huggingface.co/webui/stable-diffusion-2-1/tree/main

brkirch commented 1 year ago

This is potentially mostly fixed in Experimental Offline Standalone Mac Installer for Stable Diffusion Web UI (unofficial) 20230727 by dfb904cb753adab19d3a632da5be401295f44f28. Please give it a try and let me know if NansException occurs as often, and which version of macOS you are currently using.

brkirch commented 1 year ago

Since there have been no new reports, I'm closing this issue. If anyone still has this issue, please comment here and I'll reopen this.