[Bug]: A tensor with all NaNs was produced in VAE.

camenduru commented 1 year ago

What happened?

| | | 80s_darksouls_vibes_webui_colab
(Thanks to Lothricel for the suggestion ❤)
mahmood.mohssin99320/80sdarksoulsvibes

| | | dark_souls_diffusion_webui_colab
(Thanks to Lothricel for the suggestion ❤)
guizmus/dark-souls-diffusion

Colab cell output

Download cloudflared...: 100% 34.5M/34.5M [00:00<00:00, 192MB/s]
Calculating sha256 for /content/stable-diffusion-webui/models/Stable-diffusion/80sdarksoulsvibes_v1.ckpt: f367698d6acd12e62ac60d5e0ae4eaa53ef58a7ba127202342daf8cf45166a77
Loading weights [f367698d6a] from /content/stable-diffusion-webui/models/Stable-diffusion/80sdarksoulsvibes_v1.ckpt
Error verifying pickled file from /content/stable-diffusion-webui/models/Stable-diffusion/80sdarksoulsvibes_v1.ckpt:
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/safe.py", line 135, in load_with_extra
    check_pt(filename, extra_handler)
  File "/content/stable-diffusion-webui/modules/safe.py", line 93, in check_pt
    unpickler.load()
  File "/content/stable-diffusion-webui/modules/safe.py", line 62, in find_class
    raise Exception(f"global '{module}/{name}' is forbidden")
Exception: global 'torch/BFloat16Storage' is forbidden

The file may be malicious, so the program is not going to read it.
You can skip this check with --disable-safe-unpickle commandline argument.

loading stable diffusion model: AttributeError
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/webui.py", line 111, in initialize
    modules.sd_models.load_model()
  File "/content/stable-diffusion-webui/modules/sd_models.py", line 383, in load_model
    state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
  File "/content/stable-diffusion-webui/modules/sd_models.py", line 238, in get_checkpoint_state_dict
    res = read_state_dict(checkpoint_info.filename)
  File "/content/stable-diffusion-webui/modules/sd_models.py", line 224, in read_state_dict
    sd = get_state_dict_from_checkpoint(pl_sd)
  File "/content/stable-diffusion-webui/modules/sd_models.py", line 197, in get_state_dict_from_checkpoint
    pl_sd = pl_sd.pop("state_dict", pl_sd)
AttributeError: 'NoneType' object has no attribute 'pop'

Stable diffusion model failed to load, exiting
Error in atexit._run_exitfuncs:
TypeError: kill() missing 1 required positional argument: 'self'

Which colab and model(s) were you using when the error occurred?

https://civitai.com/models/2859/80sdarksoulsvibes

Which Public WebUI Colab URL were you using when the error occurred?

remote.moe

If you used HiRes mode when the error occurred, please provide the Hires info

No response

camenduru commented 1 year ago

revAnimated_v121.safetensors

Error completing request
Arguments: ('task(dvbeaprxr2k4pn7)', '1girl', '', [], 20, 0, False, False, 1, 1, 7, 284904639.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', False, '1:1,1:2,1:2', '0:0,0:0,0:1', '0.2,0.8,0.8', 150, 0.2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, False, False, 'positive', 'comma', 0, False, False, '', '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/content/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/content/stable-diffusion-webui/modules/processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "/content/stable-diffusion-webui/modules/processing.py", line 636, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "/content/stable-diffusion-webui/modules/devices.py", line 152, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Anonimouche commented 1 year ago

revAnimated_v121.safetensors

Error completing request
Arguments: ('task(dvbeaprxr2k4pn7)', '1girl', '', [], 20, 0, False, False, 1, 1, 7, 284904639.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', False, '1:1,1:2,1:2', '0:0,0:0,0:1', '0.2,0.8,0.8', 150, 0.2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, False, False, 'positive', 'comma', 0, False, False, '', '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/content/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/content/stable-diffusion-webui/modules/processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "/content/stable-diffusion-webui/modules/processing.py", line 636, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "/content/stable-diffusion-webui/modules/devices.py", line 152, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Got the same issue using JASMix and clear VAE, what's weird it's that it only appears for me after 4-5 rendering, and sometimes rendering after getting this error works perfectly well.

Anonimouche commented 1 year ago

For the first issue, it got flagged as dangerous by the picklescan, i feel like it would be fine to add the "--disable-safe-unpickle" argument for this particular model since both integrated picklescan of civitAI and huggingface + an other scan i did says this model is safe.

G-force78 commented 1 year ago

Added sadtalker extension to the nightly collab with cn1.1 and got a slew of errors. Also a timm module not found for deforum

Error verifying pickled file from /content/stable-diffusion-webui/extensions/SadTalker/checkpoints/hub/checkpoints/s3fd-619a316812.pth:
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/safe.py", line 81, in check_pt
    with zipfile.ZipFile(filename) as z:
  File "/usr/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/safe.py", line 135, in load_with_extra
    check_pt(filename, extra_handler)
  File "/content/stable-diffusion-webui/modules/safe.py", line 102, in check_pt
    unpickler.load()
  File "/content/stable-diffusion-webui/modules/safe.py", line 62, in find_class
    raise Exception(f"global '{module}/{name}' is forbidden")
Exception: global 'torch._utils/_rebuild_tensor' is forbidden

The file may be malicious, so the program is not going to read it.
You can skip this check with --disable-safe-unpickle commandline argument.

Anonimouche commented 1 year ago

revAnimated_v121.safetensors

Error completing request
Arguments: ('task(dvbeaprxr2k4pn7)', '1girl', '', [], 20, 0, False, False, 1, 1, 7, 284904639.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', False, '1:1,1:2,1:2', '0:0,0:0,0:1', '0.2,0.8,0.8', 150, 0.2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, False, False, 'positive', 'comma', 0, False, False, '', '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/content/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/content/stable-diffusion-webui/modules/processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "/content/stable-diffusion-webui/modules/processing.py", line 636, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "/content/stable-diffusion-webui/modules/devices.py", line 152, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Got the same issue using JASMix and clear VAE, what's weird it's that it only appears for me after 4-5 rendering, and sometimes rendering after getting this error works perfectly well.

Managed to recreate the issues 100% When creating a large number of image like that it errors out around 70% of the task. After addind the "--no-half-vae" argument in the launch code it seems that this issue is fixed and i'm now able to generate batch of image without it showing any errors. I still don't know if the error comes from the VAE or the model though.

G-force78 commented 1 year ago

Added sadtalker extension to the nightly collab with cn1.1 and got a slew of errors. Also a timm module not found for deforum

Error verifying pickled file from /content/stable-diffusion-webui/extensions/SadTalker/checkpoints/hub/checkpoints/s3fd-619a316812.pth:
Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/safe.py", line 81, in check_pt
    with zipfile.ZipFile(filename) as z:
  File "/usr/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/stable-diffusion-webui/modules/safe.py", line 135, in load_with_extra
    check_pt(filename, extra_handler)
  File "/content/stable-diffusion-webui/modules/safe.py", line 102, in check_pt
    unpickler.load()
  File "/content/stable-diffusion-webui/modules/safe.py", line 62, in find_class
    raise Exception(f"global '{module}/{name}' is forbidden")
Exception: global 'torch._utils/_rebuild_tensor' is forbidden

The file may be malicious, so the program is not going to read it.
You can skip this check with --disable-safe-unpickle commandline argument.

Which colab/model did you use?

This one here https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/nightly/stable_diffusion_1_5_webui_colab.ipynb

camenduru commented 1 year ago

Hi @G-force78 👋 please open new issue

Anonimouche commented 1 year ago

I can't get the error using your rev_animated lite colab. Did you fix it?

EDIT: tried all colabs, lite, stable and nightly, tried intensive workloads, tried txt2img, img2img, etc... Nothing seems to cause the NaNs issue

G-force78 commented 1 year ago

I can't get the error using your rev_animated lite colab. Did you fix it?

I just added cn1.1 in place of 1.0 on the original collab and works fine now. Other than modelscope but thats another issue I will open separately.

camenduru commented 1 year ago

thanks @Anonimouche ❤

Anonimouche commented 1 year ago

Actually don't know what i did now, but... No problem?

camenduru / stable-diffusion-webui-colab