AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.4k stars 26.72k forks source link

[Bug]: Refiner Module Bug in WebUI 1.6 #13020

Open Karen-the-Fantasist opened 1 year ago

Karen-the-Fantasist commented 1 year ago

Is there an existing issue for this?

What happened?

I chose sd_xl_base_1.0.safetensors as a checkpoint,sd_xl_base_1.0.safetensors as a refiner, used "txt2img" module and clicked "generate" buttom, it works.

But it doesn't work when I click "generate" AGAIN.

I checked the console, and discovered: the second time when loading refiner model, the console shows Creating model from config: D:\NovelAI\stable-diffusion-webui\configs\v1-inference.yaml, it looks strange.

As it should show Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_refiner.yaml.

Steps to reproduce the problem

  1. chose sd_xl_base_1.0.safetensors as a checkpoint,sd_xl_base_1.0.safetensors as a refiner, used "txt2img" module, input any prompts from your keyboard
  2. click "generate" button
  3. when all works have been done, click "generate" button AGAIN

What should have happened?

It should load "sd_xl_refiner.yaml" and generate successfully.

Sysinfo

sysinfo-2023-09-03-05-20.txt

What browsers do you use to access the UI ?

Microsoft Edge

Console logs

Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Startup time: 8.6s (prepare environment: 1.7s, import torch: 2.6s, import gradio: 0.9s, setup paths: 0.6s, initialize shared: 0.2s, other imports: 0.4s, load scripts: 0.6s, create ui: 0.5s, gradio launch: 0.9s).
Applying attention optimization: xformers... done.
Model loaded in 6.2s (load weights from disk: 1.2s, create model: 0.7s, apply weights to model: 1.9s, calculate empty prompt: 2.1s).
 80%|███████████████████████████████████████████████████████████████▏               | 16/20 [00:07<00:01,  2.72it/s]Reusing loaded model sd_xl_base_1.0.safetensors [31e35c80fc] to load sd_xl_refiner_1.0.safetensors [7440042bbd]4it/s]
Loading weights [7440042bbd] from D:\NovelAI\stable-diffusion-webui\models\Stable-diffusion\sd_xl_refiner_1.0.safetensors
Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_refiner.yaml
Applying attention optimization: xformers... done.
Model loaded in 2.1s (create model: 0.1s, apply weights to model: 1.5s, calculate empty prompt: 0.4s).
100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00,  1.38it/s]
Total progress: 100%|███████████████████████████████████████████████████████████████| 20/20 [00:18<00:00,  1.09it/s]
Reusing loaded model sd_xl_refiner_1.0.safetensors [7440042bbd] to load sd_xl_base_1.0.safetensors [31e35c80fc]it/s]
Loading weights [31e35c80fc] from D:\NovelAI\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors
Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Applying attention optimization: xformers... done.
Model loaded in 2.9s (create model: 0.5s, apply weights to model: 1.7s, calculate empty prompt: 0.6s).
 80%|█████████████████████████████████████████████████████████████████▌                | 16/20 [00:07<00:01,  2.71it/s]Reusing loaded model sd_xl_base_1.0.safetensors [31e35c80fc] to load sd_xl_refiner_1.0.safetensors [7440042bbd]2.73it/s]
Loading weights [7440042bbd] from cache
Creating model from config: D:\NovelAI\stable-diffusion-webui\configs\v1-inference.yaml
Applying attention optimization: xformers... done.
Model loaded in 2.2s (create model: 1.4s, apply half(): 0.6s).
 80%|█████████████████████████████████████████████████████████████████▌                | 16/20 [00:12<00:03,  1.32it/s]
*** Error completing request
*** Arguments: ('task(47qmi2mfbpp7frr)', 'masterpiece, best quality, school life, campus, sunset, ultra detailed,students walking on the road, beautiful sky, extreme wide shot', 'misshapen, blurry, crowds, moon', [], 20, 'Euler a', 1, 1, 7, 576, 1024, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x0000022AA9DA77F0>, 0, True, 'sd_xl_refiner_1.0.safetensors [7440042bbd]', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "D:\NovelAI\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "D:\NovelAI\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "D:\NovelAI\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
        processed = processing.process_images(p)
      File "D:\NovelAI\stable-diffusion-webui\modules\processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "D:\NovelAI\stable-diffusion-webui\modules\processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "D:\NovelAI\stable-diffusion-webui\modules\processing.py", line 1140, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\NovelAI\stable-diffusion-webui\VENV_DIR\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\NovelAI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "D:\NovelAI\stable-diffusion-webui\VENV_DIR\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py", line 201, in forward
        devices.test_for_nans(x_out, "unet")
      File "D:\NovelAI\stable-diffusion-webui\modules\devices.py", line 136, in test_for_nans
        raise NansException(message)
    modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

---

Additional information

I believe the bug is highly related to the function find_checkpoint_config() or guess_model_config_from_state_dict(), or misuse of the variable sd_default_config.

kamtorocks commented 1 year ago

This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Karen-the-Fantasist commented 1 year ago

This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Mention: I have rebooted the WebUI, tried --lowvram --xformers --no-half-vae --no-half command line now, and tried the steps to reproduce the problem again. Unfortunately the bug has been reproduced.

My GPU is RTX 3070 laptop, 8G VRAM, and it's impossible to say "not support half type", and the VRAM is surely enough. While if "disable-nan-check" command line has been used, it will generate black picture. That's not what I want.

I consist the reason does be the WebUI loads wrong yaml file.

console logs

tip: I closed the refiner module in the first two generations, and then I started the module.


Python 3.10.6 | packaged by conda-forge | (main, Oct  7 2022, 20:14:50) [MSC v.1916 64 bit (AMD64)]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: --lowvram --xformers --no-half-vae --no-half
Loading weights [31e35c80fc] from D:\NovelAI\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Startup time: 8.7s (prepare environment: 1.8s, import torch: 2.7s, import gradio: 0.9s, setup paths: 0.6s, initialize shared: 0.2s, other imports: 0.4s, load scripts: 0.6s, create ui: 0.5s, gradio launch: 0.9s).
Applying attention optimization: xformers... done.
Model loaded in 9.4s (load weights from disk: 1.2s, create model: 0.7s, apply weights to model: 2.0s, apply float(): 1.8s, calculate empty prompt: 3.6s).
100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [01:28<00:00,  4.45s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████| 20/20 [01:23<00:00,  4.19s/it]
100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [01:30<00:00,  4.53s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████| 20/20 [01:26<00:00,  4.33s/it]
 80%|███████████████████████████████████████████████████████████████▏               | 16/20 [01:08<00:17,  4.27s/it]Reusing loaded model sd_xl_base_1.0.safetensors [31e35c80fc] to load sd_xl_refiner_1.0.safetensors [7440042bbd]6s/it]
Loading weights [7440042bbd] from D:\NovelAI\stable-diffusion-webui\models\Stable-diffusion\sd_xl_refiner_1.0.safetensors
Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_refiner.yaml
Applying attention optimization: xformers... done.
Model loaded in 3.9s (create model: 0.1s, apply weights to model: 1.4s, apply float(): 1.4s, calculate empty prompt: 0.9s).
100%|███████████████████████████████████████████████████████████████████████████████| 20/20 [01:29<00:00,  4.46s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████| 20/20 [01:25<00:00,  4.26s/it]
Reusing loaded model sd_xl_refiner_1.0.safetensors [7440042bbd] to load sd_xl_base_1.0.safetensors [31e35c80fc]s/it]
Loading weights [31e35c80fc] from D:\NovelAI\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors
Creating model from config: D:\NovelAI\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Applying attention optimization: xformers... done.
Model loaded in 5.0s (create model: 0.6s, apply weights to model: 1.7s, apply float(): 1.7s, calculate empty prompt: 1.0s).
 80%|███████████████████████████████████████████████████████████████▏               | 16/20 [01:08<00:16,  4.23s/it]Reusing loaded model sd_xl_base_1.0.safetensors [31e35c80fc] to load sd_xl_refiner_1.0.safetensors [7440042bbd]3s/it]
Loading weights [7440042bbd] from cache
Creating model from config: D:\NovelAI\stable-diffusion-webui\configs\v1-inference.yaml
Applying attention optimization: xformers... done.
Model loaded in 1.7s (create model: 1.4s, calculate empty prompt: 0.2s).
 80%|███████████████████████████████████████████████████████████████▏               | 16/20 [01:13<00:18,  4.59s/it]
*** Error completing request
*** Arguments: ('task(wvbhqkl7z1ltw2c)', 'masterpiece, best quality, school life, campus, sunset, ultra detailed,students walking on the road, beautiful sky, extreme wide shot', 'misshapen, ((blurry)), crowds, moon', [], 20, 'Euler a', 1, 1, 7, 576, 576, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x0000016BEF771630>, 0, True, 'sd_xl_refiner_1.0.safetensors [7440042bbd]', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "D:\NovelAI\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "D:\NovelAI\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "D:\NovelAI\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
        processed = processing.process_images(p)
      File "D:\NovelAI\stable-diffusion-webui\modules\processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "D:\NovelAI\stable-diffusion-webui\modules\processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "D:\NovelAI\stable-diffusion-webui\modules\processing.py", line 1140, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 235, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\NovelAI\stable-diffusion-webui\VENV_DIR\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\NovelAI\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "D:\NovelAI\stable-diffusion-webui\VENV_DIR\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\NovelAI\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py", line 201, in forward
        devices.test_for_nans(x_out, "unet")
      File "D:\NovelAI\stable-diffusion-webui\modules\devices.py", line 136, in test_for_nans
        raise NansException(message)
    modules.devices.NansException: A tensor with all NaNs was produced in Unet. Use --disable-nan-check commandline argument to disable this check.

---
InResponse commented 1 year ago

I'm experiencing the same thing on a 4080, so VRAM isn't the issue.

Notably, the issue does not occur when using an SD1.5 model for the initial pass.

Karen-the-Fantasist commented 1 year ago

I'm experiencing the same thing on a 4080, so VRAM isn't the issue.

Notably, the issue does not occur when using an SD1.5 model for the initial pass.

Of course. The yaml file "v1-inference.yaml" is made exclusively for SD 1.5. On the Step 3, before clicking "generate" button again, When I choose sd_1.5.ckpt as a refiner, it runs.

It is strange that when I changed the variable sd_default_config in paths_internal.py from os.path.join(sd_configs_path, "v1-inference.yaml") to os.path.join(sd_configs_path, "sd_xl_refiner.yaml"), it does not work. And then I replaced all the content in "v1-inference.yaml" to the content in "sd_xl_refiner.yaml", it also does not work.

I've found a series of cumbersome steps to avoid the problem.

  1. chose sd_xl_base_1.0.safetensors as a checkpoint,sd_xl_base_1.0.safetensors as a refiner, used "txt2img" module, autometic SD VAE, input any prompts from your keyboard, then click "generate" button
  2. choose any model based on sd_1.5 as a refiner, click "generate" button
  3. choose any model based on sd_1.5 as a checkpoint, sd_xl_refiner_1.0.safetensors as a refiner, click "generate" button
  4. chose sd_xl_base_1.0.safetensors as a checkpoint,sd_xl_base_1.0.safetensors as a refiner, click "generate" button.
  5. Finished! You can click "generate" buttom as many times as you want.

Mention: the step above is not a thorough solution. When the WebUI reboot, everything goes back to what it was.

Addition:

The VRAM usage is unusual after those cumbersome steps, and the picture in step 4 is so similar to what in step 3, by the sd 1.5 checkpoint model. That's so strange. I believe there must be something wrong.

Karen-the-Fantasist commented 1 year ago

I understand, the bug may be due to setting Maximum number of checkpoints loaded at the same time. When the value is more than 1, the problem solved.