[Bug]: NaNs was produced in Unet and CUDA out of memory

Checklist

[ ] The issue exists after disabling all extensions
[ ] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

I haven't opened this app in the last 10 days. I understand that it has been updated. So, I decided to generate a 768×1280 photo as I did 10 days ago. I only had "set COMMANDLINE_ARGS=--medvram" in that folder because I have an Nvidia GeForce GTX 1650 with 4GB of VRAM. Everything worked fine, no matter what it was generating for a long time. And now, when generating, it first said that "A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card doesn't support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use the - -disable-nan-check commandline argument to disable this check." I tried first "Upcast cross attention layer to float32" and then added this "--no-half --disable-nan-check" and in both cases it started to say something like: "CUDA out of memory. Tried to allocate 960.00 MiB (GPU 0; 4.00 GiB total capacity; 1.50 GiB already allocated; 630.64 MiB free; 1.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF".

Steps to reproduce the problem

launch webui.bat, txt2img, wrote "girl" in positive prompts, A tensor with all NaNs was produced in Unet, close, edit webui.bat to --lowvram --no-half --disable-nan-check, launch, txt2img, wrote "girl" in positive prompts, CUDA out of memory.

What should have happened?

For example, generate a photo like this without any problems 01063-3470751644-masterpiece, best quality, (colorful), (delicate eyes and face), volumetric light, ray tracing, extremely detailed CG unity 8k w

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2023-12-24-02-58.json

Console logs

From https://github.com/AUTOMATIC1111/stable-diffusion-webui
 * branch              master     -> FETCH_HEAD
Already up to date.
venv "venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Version: 1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --lowvram --no-half --disable-nan-check
No module 'xformers'. Proceeding without it.
Style database not found: E:\StableDiffusion\styles.csv
Loading weights [9be2111a39] from E:\StableDiffusion\models\Stable-diffusion\futanariFactor_alphaV10.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 83.2s (initial startup: 0.2s, prepare environment: 22.7s, import torch: 21.1s, import gradio: 8.6s, setup paths: 8.7s, initialize shared: 1.2s, other imports: 7.9s, setup codeformer: 1.1s, setup gfpgan: 0.3s, list SD models: 0.8s, load scripts: 3.6s, load upscalers: 0.2s, initialize extra networks: 0.5s, scripts before_ui_callback: 0.7s, create ui: 3.1s, gradio launch: 3.0s).
Creating model from config: E:\StableDiffusion\configs\v1-inference.yaml
Loading VAE weights specified in settings: E:\StableDiffusion\models\VAE\color101VAE_v1.pt
Applying attention optimization: Doggettx... done.
Model loaded in 156.3s (load weights from disk: 7.5s, create model: 0.8s, apply weights to model: 117.2s, apply float(): 0.3s, load VAE: 4.6s, hijack: 0.2s, load textual inversion embeddings: 0.8s, calculate empty prompt: 24.7s).
 10%|████████▎                                                                          | 2/20 [01:00<09:05, 30.32s/it]
*** Error completing request                                                            | 2/20 [00:12<01:48,  6.04s/it]
*** Arguments: ('task(naisbmk5tyw9wby)', 'girl ', '(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation', [], 20, 'Euler a', 1, 1, 8, 1280, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x0000023D96DD1F30>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "E:\StableDiffusion\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "E:\StableDiffusion\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "E:\StableDiffusion\modules\txt2img.py", line 55, in txt2img
        processed = processing.process_images(p)
      File "E:\StableDiffusion\modules\processing.py", line 734, in process_images
        res = process_images_inner(p)
      File "E:\StableDiffusion\modules\processing.py", line 875, in process_images_inner
        x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
      File "E:\StableDiffusion\modules\processing.py", line 596, in decode_latent_batch
        sample = decode_first_stage(model, batch[i:i + 1])[0]
      File "E:\StableDiffusion\modules\sd_samplers_common.py", line 76, in decode_first_stage
        return samples_to_images_tensor(x, approx_index, model)
      File "E:\StableDiffusion\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
        x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
      File "E:\StableDiffusion\modules\sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "E:\StableDiffusion\modules\sd_hijack_utils.py", line 28, in __call__
        return self.__orig_func(*args, **kwargs)
      File "E:\StableDiffusion\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
        return self.first_stage_model.decode(z)
      File "E:\StableDiffusion\modules\lowvram.py", line 71, in first_stage_model_decode_wrap
        return first_stage_model_decode(z)
      File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
        dec = self.decoder(z)
      File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 641, in forward
        h = self.up[i_level].upsample(h)
      File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 64, in forward
        x = self.conv(x)
      File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\StableDiffusion\extensions\a1111-sd-webui-lycoris\lycoris.py", line 753, in lyco_Conv2d_forward
        return torch.nn.Conv2d_forward_before_lyco(self, input)
      File "E:\StableDiffusion\extensions-builtin\Lora\networks.py", line 501, in network_Conv2d_forward
        return originals.Conv2d_forward(self, input)
      File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 960.00 MiB (GPU 0; 4.00 GiB total capacity; 1.50 GiB already allocated; 630.64 MiB free; 1.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

---

Additional information

The last thing I can remember is that I used the "StableDiffusion InvokeAI Base Cloud version" in GoogleCollab

AUTOMATIC1111 / stable-diffusion-webui