AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.13k stars 26.4k forks source link

[Bug]: "A tensor with all NaN's was produced in Unet" #15872

Open MagyTheMage opened 3 months ago

MagyTheMage commented 3 months ago

Checklist

What happened?

Whenever attempting to generate an image, there is a seemingly random chance that an error appears stating the following:

"NansException: A tensor with all NaN's was produced in Unet. This could be either because there is not enough precision to represent the picture of because your video card does not support the half type, try setting "Upcast cross attention layer to float 32" option in settings >Stable difussion or using the --no-half commandline to fix this, Use --disable-nan-check commandline argument to disable this check"

Steps to reproduce the problem

  1. Open the WebUI
  2. Type in any prompt: Example "1girl" (any prompt will suffice)
  3. Click Generate
  4. Error may happen shortly after (Its heavily inconsistent)

What should have happened?

WebUI should have begun to generate the image, however it didnt, note that based on my testing it seems that the longer the prompt the more likely it is to happen, longer prompts simply failing to work even after repeated attempts

What browsers do you use to access the UI ?

Other

Sysinfo

sysinfo-2024-05-23-17-46.json

Console logs

venv "D:\Stable diffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3
Commit hash: 1c0a0c4c26f78c32095ebc7f8af82f5c04fca8c0
Launching Web UI with arguments: --lowvram --xformers --opt-split-attention
*** Error loading script: comments.py
    Traceback (most recent call last):
      File "D:\Stable diffusion\stable-diffusion-webui\modules\scripts.py", line 518, in load_scripts
        script_module = script_loading.load_module(scriptfile.path)
      File "D:\Stable diffusion\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
        module_spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 883, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "D:\Stable diffusion\stable-diffusion-webui\modules/processing_scripts\comments.py", line 30, in <module>
        def before_token_counter(params: script_callbacks.BeforeTokenCounterParams):
    AttributeError: module 'modules.script_callbacks' has no attribute 'BeforeTokenCounterParams'

---
Loading weights [7eb674963a] from D:\Stable diffusion\stable-diffusion-webui\models\Stable-diffusion\hassakuHentaiModel_v13.safetensors
*** Error calling: D:\Stable diffusion\stable-diffusion-webui\modules/processing_scripts\sampler.py/ui
    Traceback (most recent call last):
      File "D:\Stable diffusion\stable-diffusion-webui\modules\scripts.py", line 538, in wrap_call
        return func(*args, **kwargs)
      File "D:\Stable diffusion\stable-diffusion-webui\modules/processing_scripts\sampler.py", line 20, in ui
        sampler_names = [x.name for x in sd_samplers.visible_samplers()]
    AttributeError: module 'modules.sd_samplers' has no attribute 'visible_samplers'

---
*** Error calling: D:\Stable diffusion\stable-diffusion-webui\modules/processing_scripts\sampler.py/ui
    Traceback (most recent call last):
      File "D:\Stable diffusion\stable-diffusion-webui\modules\scripts.py", line 538, in wrap_call
        return func(*args, **kwargs)
      File "D:\Stable diffusion\stable-diffusion-webui\modules/processing_scripts\sampler.py", line 20, in ui
        sampler_names = [x.name for x in sd_samplers.visible_samplers()]
    AttributeError: module 'modules.sd_samplers' has no attribute 'visible_samplers'

---
Creating model from config: D:\Stable diffusion\stable-diffusion-webui\configs\v1-inference.yaml
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 213.7s (initial startup: 0.3s, prepare environment: 59.7s, import torch: 63.5s, import gradio: 24.3s, setup paths: 18.1s, import ldm: 0.3s, initialize shared: 3.2s, other imports: 17.5s, setup gfpgan: 0.5s, list SD models: 0.6s, load scripts: 19.1s, reload hypernetworks: 0.3s, initialize extra networks: 0.3s, create ui: 4.2s, gradio launch: 3.0s).
Applying attention optimization: xformers... done.
Model loaded in 152.7s (load weights from disk: 6.0s, create model: 3.5s, apply weights to model: 124.3s, apply half(): 1.5s, apply dtype to VAE: 0.1s, load VAE: 0.9s, load weights from state dict: 0.1s, hijack: 0.7s, load textual inversion embeddings: 7.6s, calculate empty prompt: 7.7s).
  0%|                                                                                           | 0/20 [02:42<?, ?it/s]
*** Error completing request
*** Arguments: ('task(fsze88fxe1m09yn)', <gradio.routes.Request object at 0x0000026E230A6590>, '<lora:Abiogenesis_v1:1>, alien bipedal animal, green colored', 'ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face, smeared, signature, text, inscription, logo', [], 20, 'Euler a', 1, 1, 7, 768, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', ['Model hash: 7eb674963a'], 0, False, '', 0.8, 1060426381, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "D:\Stable diffusion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "D:\Stable diffusion\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "D:\Stable diffusion\stable-diffusion-webui\modules\txt2img.py", line 110, in txt2img
        processed = processing.process_images(p)
      File "D:\Stable diffusion\stable-diffusion-webui\modules\processing.py", line 782, in process_images
        res = process_images_inner(p)
      File "D:\Stable diffusion\stable-diffusion-webui\modules\processing.py", line 944, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "D:\Stable diffusion\stable-diffusion-webui\modules\processing.py", line 1274, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "D:\Stable diffusion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\Stable diffusion\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "D:\Stable diffusion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\Stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\Stable diffusion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "D:\Stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\Stable diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\Stable diffusion\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py", line 217, in forward
        devices.test_for_nans(x_out, "unet")
      File "D:\Stable diffusion\stable-diffusion-webui\modules\devices.py", line 231, in test_for_nans
        raise NansException(message)
    modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

---

Additional information

Although my PC is relatively low end and somewhat old (Currently trying to work towards an upgrade) i have used stable difussion for about a year if not more now and it has never had any issues up until a few months ago where this issue started happening, i have not experienced other issues with the computer including using other generative AI such as Voice.AI and KoboldCCP

The error seems heavily in inconsistent as repetedly clicking on the generate button may eventually cause the image to be generated, seems like its a 50/50 chance wether it errors out or not.

Using --disable-nan-check will allow the error not to happen, however the image may or may not result in a black screen, wasting a lot of time just to generate a black image making it faster to simply keep letting it error out until it starts to generate

Using --no-half will stop the issue from happening however it will increase ram usage causing the computer to struggle to load or crash the program out right. When not crashing, the generation takes x10 as long, moving from 1-2 minutes with a 512x768 image to 15-20minutes for a single image. (Note that the computer only has 8GB of ram and 4GB of VRAM, pherhaps this solution could work on a stronger computer, although google searches about this problem have shown that even those with very good video cards experienced heavy lag when using no-half)

Enabling Upcast cross attention layer to float 32 will sometimes help, as i will see in the console:

A tensor with all NaNs was produced in VAE. Web UI will now convert VAE into 32-bit float and retry. To disable this behavior, disable the 'Automatically revert VAE to 32-bit floats' setting. To always start with 32-bit VAE, use --no-half-vae commandline flag.

but this also seems to be inconsistent as it wont always fix it right away and it will simply error out.

I have googled endless solutions, attempted reinstalls, removed all extensions all loras, using different launch arguments, disabling all extension, using different models, reverting back to old versions that i know worked fine, reinstalled drivers, updated drivers, performed several health checks on the computer, attempted completly clean installs, etc but so far i have not figured out a way to solve the issue succesfully hence why i would like some help.

OtenMoten commented 1 month ago

Same for me.

computer_info.json

using in PyCharm 2024.1.1 (Professional Edition) Build #PY-241.15989.155, built on April 29, 2024 with Runtime version: 17.0.10+1-b1207.14 amd64 and VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o.

heyalexchoi commented 1 month ago

There is a known issue with SDXL VAE producing NaNs in fp16. Most fixes I've seen around involve forcing Automatic1111 into using full precision, however there are performance drawbacks to this approach.

You can fix the NaN issue more directly by using this fixed half-precision VAE here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

You can follow the instructions to install and switch to this VAE.

iSupremacy commented 1 month ago

There is a known issue with SDXL VAE producing NaNs in fp16. Most fixes I've seen around involve forcing Automatic1111 into using full precision, however there are performance drawbacks to this approach.

You can fix the NaN issue more directly by using this fixed half-precision VAE here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

You can follow the instructions to install and switch to this VAE.

But the thing is I was just inpainting fine, and then this started happening out of nowhere. Same session, so what the heck broke in the blink of an eye?

Started happening when I started using an inpainting model which ive used before but for a few days. But its main model ive been using for last month. Has bult in VAE is a pony XL