[Bug]: SDPA doesn't work

MoonRide303 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

SDPA (one of the optimizations, --opt-sdp-attention or --opt-sdp-no-mem-attention added to COMMANDLINE_ARGS) doesn't work - using txt2img ends with RuntimeError:

Traceback (most recent call last):
  File "D:\tools\Stable-Diffusion-web-UI\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "D:\tools\Stable-Diffusion-web-UI\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 671, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 671, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 444, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\tools\Stable-Diffusion-web-UI\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\tools\Stable-Diffusion-web-UI\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\anaconda3\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\tools\Stable-Diffusion-web-UI\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 631, in forward
    h = self.mid.attn_1(h)
  File "D:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\modules\sd_hijack_optimizations.py", line 490, in sdp_attnblock_forward
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: struct c10::Half instead.

Steps to reproduce the problem

Add --opt-sdp-attention or --opt-sdp-no-mem-attention to COMMANDLINE_ARGS (in webui-user.bat)
Launch webui-user.bat.
Try to generate any image using txt2img.

No problems without SDPA.

What should have happened?

Both SDPA options should work (txt2img should be able to generate images).

Commit where the problem happens

https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/72cd27a13587c9579942577e9e3880778be195f6

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--opt-sdp-attention

List of extensions

No

Console logs

(base) D:\tools\Stable-Diffusion-web-UI>webui-user-sdpa.bat
Python 3.10.10 | packaged by Anaconda, Inc. | (main, Mar 21 2023, 18:39:17) [MSC v.1916 64 bit (AMD64)]
Commit hash: 72cd27a13587c9579942577e9e3880778be195f6
Installing requirements
Launching Web UI with arguments: --opt-sdp-attention
No module 'xformers'. Proceeding without it.
*** "Disable all extensions" option was set, will only load built-in extensions ***
Loading weights [009eed2ef1] from D:\tools\Stable-Diffusion-web-UI\models\Stable-diffusion\v1-5-pruned.safetensors
Creating model from config: D:\tools\Stable-Diffusion-web-UI\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying scaled dot product cross attention optimization.
Textual inversion embeddings loaded(17): bad-artist, bad-artist-anime, bad-hands-5, bad-image-9600, bad-image-v2-11000, bad-image-v2-27000, bad-image-v2-39000, bad_prompt, bad_prompt_version2, bad_quality, boring_e621, EasyNegative, EasyNegativeV2, ng_deepnegative_v1_32t, ng_deepnegative_v1_64t, ng_deepnegative_v1_75t, pureerosface_v1
Textual inversion embeddings skipped(15): GTA768, InkPunk768, InkPunkHeavy768, InkPunkLandscapes768, InkPunkLite768, nartfixer, Neg_Facelift768, nfixer, nrealfixer, PaintStyle4, pinup768, rev2-badprompt, SCG768-Euphoria, SCG768-Nebula, SDA768
Model loaded in 2.4s (load weights from disk: 0.3s, create model: 0.3s, apply weights to model: 0.3s, apply half(): 0.3s, move model to device: 0.3s, load textual inversion embeddings: 0.8s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 14.4s (import torch: 3.6s, import gradio: 0.9s, import ldm: 0.4s, other imports: 0.8s, list SD models: 0.4s, load scripts: 1.0s, load SD checkpoint: 2.6s, create ui: 2.5s, gradio launch: 2.2s).
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  9.08it/s]
Error completing request█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏       | 19/20 [00:00<00:00, 22.13it/s]
Arguments: ('task(djamhlamkhsdoh9)', 'whatever', '', [], 20, 0, False, False, 1, 1, 7, 1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "D:\tools\Stable-Diffusion-web-UI\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "D:\tools\Stable-Diffusion-web-UI\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 671, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 671, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\tools\Stable-Diffusion-web-UI\modules\processing.py", line 444, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\tools\Stable-Diffusion-web-UI\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\tools\Stable-Diffusion-web-UI\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\anaconda3\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\tools\Stable-Diffusion-web-UI\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 631, in forward
    h = self.mid.attn_1(h)
  File "D:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\tools\Stable-Diffusion-web-UI\modules\sd_hijack_optimizations.py", line 490, in sdp_attnblock_forward
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: struct c10::Half instead.

Total progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:18<00:00, 22.13it/s]

Additional information

No response

ClashSAN commented 1 year ago

What's the graphics card you're using?

MoonRide303 commented 1 year ago

@ClashSAN RTX 4080

MoonRide303 commented 1 year ago

PS I am not sure if it matters (it didn't cause any issues w/o SDPA), but as there is something about types in the error message I will mention it, just in case - I am using txt2img with fp16 models (converted using sd-webui-model-converter).

meishild commented 1 year ago

I also find this error, the same use SDPA, when i use xformers can finished。

RXT4090

2.0.0+cu118 autocast half Stable Diffusion: [cf1d67a] 2023-03-25 Taming Transformers: [2426893] 2022-01-13 CodeFormer: [c5b4593] 2022-09-09 BLIP: [48211a1] 2022-06-07 k_diffusion: [5b3af03] 2022-11-23

jmp909 commented 1 year ago

getting this same error since update earlier today RuntimeError: Expected query, key, and value to have the same dtype,

3080, using COMMANDLINE_ARGS=--opt-sdp-no-mem-attention --api

it was fine yesterday

dep commented 1 year ago

I'm getting this error today after pulling 1.3.0

 Expected query, key, and value to have the same dtype,
but got query.dtype: float key.dtype: float and value.dtype: struct c10::Half instead

jmp909 commented 1 year ago

Does it work if you turn off Upcast in settings->Stable Diffusion?

D-Ogi commented 1 year ago

I got it fixed by any of following

Turning off xformers in A111 setting of Cross Attention Optimization.
Turning off 32 bit upcasting.
Keeping 32 upcasting with Doggetx Cross Attention Optimization.

AUTOMATIC1111 / stable-diffusion-webui