[Bug]: SDXL inpainting results in 'NansException' occurred with 1st settings. Error when VAE is present on MacOS

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

On my Mac Book Pro any img2img with sdxl checkpoints does not work when a VAE is baked into the checkpoint or a VAE is selected in webui, if it is selected and try to generate a image a NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check. error occures.

Using --no-half does indeed fix the problem but then it takes ages and I would not call it a fix due to the fact that it does work with SD1.5 checkpoints where VAE is activated without a problem and even SDXL img2img works without a problem as long as VAE is disabled (which would be fine for me if checkpoint creators would stop baking in the VAE all the time ._.). Normal txt2img does work without a problem with sdxl checkpoints and selected VAE without a problem.

Steps to reproduce the problem

Select a SDXL Model
Set the VAE to an usable sdxl vae (or choose a Model with baked in VAE)
go to img2img
wrtie a prompt
upload a image
Generate

What should have happened?

A new image should have been generated instead of throwing an NansException Error

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-04-28-08-27.json

Console logs

################################################################
Launching launch.py...
################################################################
Python 3.10.13 (main, Mar 17 2024, 20:31:43) [Clang 15.0.0 (clang-1500.3.9.4)]
Version: v1.9.3
Commit hash: 1c0a0c4c26f78c32095ebc7f8af82f5c04fca8c0
Launching Web UI with arguments: --skip-torch-cuda-test --opt-sub-quad-attention --upcast-sampling --no-half-vae --use-cpu interrogate
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
Loading weights [15dc93da84] from /Users/c/stable-diffusion-webui/models/Stable-diffusion/pony/matrixPony_v4.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 4.3s (import torch: 2.2s, import gradio: 0.5s, setup paths: 0.6s, initialize shared: 0.1s, other imports: 0.4s, create ui: 0.2s, gradio launch: 0.1s).
Creating model from config: /Users/c/stable-diffusion-webui/repositories/generative-models/configs/inference/sd_xl_base.yaml
Applying attention optimization: sdp... done.
Model loaded in 54.4s (load weights from disk: 0.5s, create model: 0.6s, apply weights to model: 52.3s, apply dtype to VAE: 0.1s, move model to device: 0.2s, calculate empty prompt: 0.5s).
  0%|                                                                                                                                                                                                   | 0/16 [00:01<?, ?it/s]
*** Error completing request
*** Arguments: ('task(npe6u0644pbgbjj)', <gradio.routes.Request object at 0x2dd4975e0>, 0, 'eating cookies', '', [], <PIL.Image.Image image mode=RGBA size=512x768 at 0x2DD4673D0>, None, None, None, None, None, None, 4, 0, 1, 1, 1, 7, 1.5, 0.75, 0.0, 512, 512, 1, 0, 0, 32, 0, '', '', '', [], False, [], '', 0, 20, 'DPM++ 2M', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, '* `CFG Scale` should be 2 or lower.', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, 'start', '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "/Users/c/stable-diffusion-webui/modules/call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "/Users/c/stable-diffusion-webui/modules/call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "/Users/c/stable-diffusion-webui/modules/img2img.py", line 232, in img2img
        processed = process_images(p)
      File "/Users/c/stable-diffusion-webui/modules/processing.py", line 845, in process_images
        res = process_images_inner(p)
      File "/Users/c/stable-diffusion-webui/modules/processing.py", line 981, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "/Users/c/stable-diffusion-webui/modules/processing.py", line 1741, in sample
        samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
      File "/Users/c/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 172, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/Users/c/stable-diffusion-webui/modules/sd_samplers_common.py", line 272, in launch_sampling
        return func()
      File "/Users/c/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 172, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/Users/c/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/Users/c/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "/Users/c/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "/Users/c/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/c/stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py", line 269, in forward
        devices.test_for_nans(x_out, "unet")
      File "/Users/c/stable-diffusion-webui/modules/devices.py", line 255, in test_for_nans
        raise NansException(message)
    modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

---

Additional information

I tried updating/downgrading torch but that did not make any difference...

AUTOMATIC1111 / stable-diffusion-webui