[Bug]: fp8 makes model loading fail with Promotion for Float8 types is not supported

Vektor8298 commented 2 months ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

When trying to run sd-webui-reforge with fp8, it fails to load the model and never runs, throwing an error.

Steps to reproduce the problem

Enable fp8 with --unet-in-fp8-e5m2 or --unet-in-fp8-e4m3fn in webui-user.sh
Run webui.sh
Observe the error when loading the SD Model

What should have happened?

It should load the model and let me run the webui.

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-08-10-14-04.json

Console logs

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on voivod user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.40
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Python 3.10.6 (main, May  5 2024, 11:48:59) [GCC 13.2.1 20240417]
Version: f1.0.1-v1.10.1RC-latest-677-g6c01cab4
Commit hash: 6c01cab47e6a151ee595334e960016aea87f2d4e
Installing requirements
loading WD14-tagger reqs from /home/voivod/stable-diffusion-webui-forge/extensions/stable-diffusion-webui-wd14-tagger/requirements.txt
Checking WD14-tagger requirements.
Launching Web UI with arguments: --device-id=0 --disable-xformers --attention-pytorch --cuda-malloc --cuda-stream --unet-in-fp8-e4m3fn --enable-insecure-extension-access --no-gradio-queue --listen --port 7861
Using cudaMallocAsync backend.
Total VRAM 3806 MB, total RAM 15717 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --always-normal-vram
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3050 Laptop GPU : cudaMallocAsync
Hint: your device supports --pin-shared-memory for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  True
2024-08-10 11:00:21.835997: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-10 11:00:21.949190: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-10 11:00:22.011715: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-10 11:00:22.022164: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-10 11:00:22.108505: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-10 11:00:23.062633: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Using pytorch cross attention
ControlNet preprocessor location: /home/voivod/stable-diffusion-webui-forge/models/ControlNetPreprocessor
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[-] ADetailer initialized. version: 24.6.0, num models: 15
== WD14 tagger /gpu:0, uname_result(system='Linux', node='terminalredux', release='6.9.9-arch1-1.2-g14', version='#1 SMP PREEMPT_DYNAMIC Tue, 16 Jul 2024 05:59:42 +0000', machine='x86_64') ==
Loading model SDXL/4thTailAnimeHentai_v045.safetensors [45a5febab2] (1 of 1)
Loading weights [45a5febab2] from /home/voivod/stable-diffusion-webui-forge/models/Stable-diffusion/SDXL/4thTailAnimeHentai_v045.safetensors
2024-08-10 11:00:30,557 - ControlNet - INFO - ControlNet UI callback registered.
/home/voivod/stable-diffusion-webui-forge/modules/gradio_extensons.py:25: GradioDeprecationWarning: `height` is deprecated in `Interface()`, please use it within `launch()` instead.
  res = original_IOComponent_init(self, *args, **kwargs)
Scanning <DirEntry 'deepdanbooru-v3-20211112-sgd-e28'> as deepdanbooru project
Scanning <DirEntry 'deepdanbooru-v4-20200814-sgd-e30'> as deepdanbooru project
Running on local URL:  http://0.0.0.0:7861

To create a public link, set `share=True` in `launch()`.
IIB Database file has been successfully backed up to the backup folder.
Startup time: 22.4s (prepare environment: 9.3s, import torch: 2.6s, import gradio: 0.7s, setup paths: 3.7s, other imports: 0.5s, load scripts: 4.5s, create ui: 0.6s, gradio launch: 0.4s).
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Loading VAE weights specified in settings: /home/voivod/stable-diffusion-webui-forge/models/VAE/sdxl.vae.safetensors
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.01 seconds
loading stable diffusion model: RuntimeError
Traceback (most recent call last):
  File "/home/voivod/stable-diffusion-webui-forge/launch.py", line 51, in <module>
    main()
  File "/home/voivod/stable-diffusion-webui-forge/launch.py", line 47, in main
    start()
  File "/home/voivod/stable-diffusion-webui-forge/modules/launch_utils.py", line 550, in start
    main_thread.loop()
  File "/home/voivod/stable-diffusion-webui-forge/modules_forge/main_thread.py", line 37, in loop
    task.work()
  File "/home/voivod/stable-diffusion-webui-forge/modules_forge/main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "/home/voivod/stable-diffusion-webui-forge/modules/sd_models.py", line 569, in get_sd_model
    load_model()
  File "/home/voivod/stable-diffusion-webui-forge/modules/sd_models.py", line 720, in load_model
    sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)
  File "/home/voivod/stable-diffusion-webui-forge/modules/sd_models.py", line 596, in get_empty_cond
    d = sd_model.get_learned_conditioning([""])
  File "/home/voivod/stable-diffusion-webui-forge/modules/sd_models_xl.py", line 37, in get_learned_conditioning
    c = self.conditioner(sdxl_conds, force_zero_embeddings=['txt'] if force_zero_negative_prompt else [])
  File "/home/voivod/stable-diffusion-webui-forge/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/voivod/stable-diffusion-webui-forge/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/voivod/stable-diffusion-webui-forge/repositories/generative-models/sgm/modules/encoders/modules.py", line 168, in forward
    output[out_key] = torch.cat(
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float and Float8_e4m3fn

Stable diffusion model failed to load

Additional information

Recently upgrade webui to the latest version. Tried to upgrade torch to latest version Tried with --xformers and --pytorch-attention

Panchovix commented 2 months ago

Hi there, thanks for the report. Does this happened on OG Forge as well (before the new updates), or it worked there?

Does it work on dev_upstream branch?

Music4Dogs commented 2 months ago

Both unet-fp8 flags worked fine for me on an older ReForge commit from before controlnet was updated, on main branch. Worked fine on og forge dev branch as well.

Panchovix commented 2 months ago

Can you pin point me to that older reforge commit please? Also, maybe you can try upgrading torch to 2.3.1

Vektor8298 commented 2 months ago

@Panchovix Hi, it worked on vanilla Forge, but this was months ago, I tried when I had an AMD GPU. I will try to reinstall reForge (since I upgraded to reForge from vanilla Forge) and see if that helps. EDIT: Works fine with SD 1.5, fails with the same error on SDXL.

model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'}
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.01 seconds
Traceback (most recent call last):
  File "/home/voivod/stable-diffusion-webui-reForge/modules_forge/main_thread.py", line 37, in loop
    task.work()
  File "/home/voivod/stable-diffusion-webui-reForge/modules_forge/main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models.py", line 752, in reload_model_weights
    return load_model(info)
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models.py", line 720, in load_model
    sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models.py", line 596, in get_empty_cond
    d = sd_model.get_learned_conditioning([""])
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models_xl.py", line 37, in get_learned_conditioning
    c = self.conditioner(sdxl_conds, force_zero_embeddings=['txt'] if force_zero_negative_prompt else [])
  File "/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/voivod/stable-diffusion-webui-reForge/repositories/generative-models/sgm/modules/encoders/modules.py", line 168, in forward
    output[out_key] = torch.cat(
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float and Float8_e5m2
Promotion for Float8 Types is not supported, attempted to promote Float and Float8_e5m2

Panchovix commented 2 months ago

Hi there, thanks for the report. That's a weird issue, which torch version do you have installed?

Also did a new branch with some experimental changes, if you want to test there as well https://github.com/Panchovix/stable-diffusion-webui-reForge/commits/dev_upstream_experimental

Vektor8298 commented 2 months ago

Hi there, thanks for the report. That's a weird issue, which torch version do you have installed?

Also did a new branch with some experimental changes, if you want to test there as well https://github.com/Panchovix/stable-diffusion-webui-reForge/commits/dev_upstream_experimental

Thanks @Panchovix I will try, since I have been getting lots of TypeError: 'NoneType' object is not iterable errors too, on my 4gb 3050 laptop gpu, which I did not get before. EDIT: It still wont work under dev_upstream_experimental

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on voivod user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
python venv already activate or run without venv: /home/voivod/stable-diffusion-webui-reForge/venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.40
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is linked with libc.so,execute LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4
Python 3.10.6 (main, May  5 2024, 11:48:59) [GCC 13.2.1 20240417]
Version: f1.0.2dev-experimental-v1.10.1RC-latest-1094-gb1b58871
Commit hash: b1b588713b17ab5b585b56e95475ebc9e3cf765e
Launching Web UI with arguments: --skip-install --device-id=0 --xformers --cuda-malloc --cuda-stream --unet-in-fp8-e4m3fn --enable-insecure-extension-access --no-gradio-queue --listen --port 7861
Using cudaMallocAsync backend.
/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
Device: cuda:0 NVIDIA GeForce RTX 3050 Laptop GPU : cudaMallocAsync
Hint: your device supports --pin-shared-memory for potential speed improvements, but may cause issues.
Set vram state to: NORMAL_VRAM
CUDA Stream Activated:  True
/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
2024-08-18 10:08:02.579554: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-18 10:08:02.591073: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-18 10:08:02.603373: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-18 10:08:02.607152: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-18 10:08:02.618153: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-18 10:08:03.624708: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Using VAE dtype: torch.bfloat16
ControlNet preprocessor location: /home/voivod/stable-diffusion-webui-reForge/models/ControlNetPreprocessor
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[-] ADetailer initialized. version: 24.8.0, num models: 15
== WD14 tagger /gpu:0, uname_result(system='Linux', node='terminalredux', release='6.10.2-arch1-1.1-g14', version='#1 SMP PREEMPT_DYNAMIC Tue, 30 Jul 2024 06:22:55 +0000', machine='x86_64') ==
Loading model SDXL/4thTailAnimeHentai_v045.safetensors [45a5febab2] (1 of 1)
Loading weights [45a5febab2] from /home/voivod/stable-diffusion-webui-reForge/models/Stable-diffusion/SDXL/4thTailAnimeHentai_v045.safetensors
2024-08-18 10:08:13,635 - ControlNet - INFO - ControlNet UI callback registered.
/home/voivod/stable-diffusion-webui-reForge/modules/gradio_extensons.py:25: GradioDeprecationWarning: `height` is deprecated in `Interface()`, please use it within `launch()` instead.
  res = original_IOComponent_init(self, *args, **kwargs)
Scanning <DirEntry 'deepdanbooru-v3-20211112-sgd-e28'> as deepdanbooru project
Scanning <DirEntry 'deepdanbooru-v4-20200814-sgd-e30'> as deepdanbooru project
Running on local URL:  http://0.0.0.0:7861

To create a public link, set `share=True` in `launch()`.
IIB Database file has been successfully backed up to the backup folder.
Startup time: 19.6s (prepare environment: 2.0s, import torch: 4.3s, import gradio: 0.7s, setup paths: 3.1s, initialize shared: 0.3s, other imports: 0.7s, load scripts: 6.7s, scripts list_optimizers: 0.1s, create ui: 1.3s, gradio launch: 0.1s).
Using VAE dtype: torch.bfloat16
/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
WARNING:root:clip missing: ['clip_l.text_projection', 'clip_l.logit_scale', 'clip_g.text_projection']
Loading VAE weights specified in settings: /home/voivod/stable-diffusion-webui-reForge/models/VAE/sdxl.vae.safetensors
loading stable diffusion model: RuntimeError
Traceback (most recent call last):
  File "/home/voivod/stable-diffusion-webui-reForge/launch.py", line 51, in <module>
    main()
  File "/home/voivod/stable-diffusion-webui-reForge/launch.py", line 47, in main
    start()
  File "/home/voivod/stable-diffusion-webui-reForge/modules/launch_utils.py", line 550, in start
    main_thread.loop()
  File "/home/voivod/stable-diffusion-webui-reForge/modules_forge/main_thread.py", line 37, in loop
    task.work()
  File "/home/voivod/stable-diffusion-webui-reForge/modules_forge/main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models.py", line 569, in get_sd_model
    load_model()
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models.py", line 730, in load_model
    sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models.py", line 596, in get_empty_cond
    d = sd_model.get_learned_conditioning([""])
  File "/home/voivod/stable-diffusion-webui-reForge/modules/sd_models_xl.py", line 37, in get_learned_conditioning
    c = self.conditioner(sdxl_conds, force_zero_embeddings=['txt'] if force_zero_negative_prompt else [])
  File "/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/voivod/stable-diffusion-webui-reForge/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/voivod/stable-diffusion-webui-reForge/repositories/generative-models/sgm/modules/encoders/modules.py", line 168, in forward
    output[out_key] = torch.cat(
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float and Float8_e4m3fn

Stable diffusion model failed to load

Vektor8298 commented 2 months ago

@Panchovix Tried with torch 2.4.0, torch 2.3.1 CUDA 12 and nothing makes it work, SD 1.5 works fine as I stated. Vanilla Forge with latest commit before going experimental seems to work with both 1.5 and XL.

Panchovix / stable-diffusion-webui-reForge