Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
17.04k stars 1.96k forks source link

[Bug]: IPAdapter, RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: c10::Half key.dtype: float and value.dtype: float instead. #2208

Closed frankjiang closed 10 months ago

frankjiang commented 1 year ago

Is there an existing issue for this?

What happened?

IPAdapter cannot run correctly.

Steps to reproduce the problem

  1. Img2Img
  2. ControlNet (latest)
  3. Choose IPAdapter
  4. Choose ip-adapter_clip_sd15 (default)
  5. Choose ip-adapter-plus-face_sd15 [71693645] (default)
  6. Add prompts
  7. Generate

What should have happened?

raise a RuntimeError

Commit where the problem happens

webui: 5ef669de080814067961f28357256e8fe27544f4 controlnet: 3011ff6e706d3fdd0cc7d2ac8ff0d59020b8f767

What browsers do you use to access the UI ?

No response

Command Line Arguments

No

List of enabled extensions

image

Console logs

*** Error completing request
*** Arguments: ('task(510w65ya0s7jt96)', 0, '', '', ['Asian Boy Portrait'], <PIL.Image.Image image mode=RGBA size=512x512 at 0x2A90DE920>, None, None, None, None, None, None, 20, 'DPM++ 2M Karras', 4, 0, 1, 1, 1, 7, 1.5, 0.75, 0, 512, 512, 1, 0, 0, 32, 0, '', '', '', [], False, [], '', <gradio.routes.Request object at 0x32fc100a0>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 512, 64, True, True, True, False, <scripts.animatediff_ui.AnimateDiffProcess object at 0x36fc1c880>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x32fb90400>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x32fb902b0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x2afde87f0>, '* `CFG Scale` should be 2 or lower.', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50) {}
    Traceback (most recent call last):
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/img2img.py", line 208, in img2img
        processed = process_images(p)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 451, in process_sample
        return process.sample_before_CN_hack(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/processing.py", line 1528, in sample
        samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_samplers_cfg_denoiser.py", line 169, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/modules/sd_hijack_unet.py", line 48, in apply_model
        return orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).float()
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 858, in forward_webui
        raise e
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 855, in forward_webui
        return forward(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/hook.py", line 762, in forward
        h = module(h, emb, context)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
        x = layer(x, context)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
        x = block(x, context=context[i])
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 269, in forward
        return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 121, in checkpoint
        return CheckpointFunction.apply(func, len(inputs), *args)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
        return super().apply(*args, **kwargs)  # type: ignore[misc]
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 136, in forward
        output_tensors = ctx.run_function(*ctx.input_tensors)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 273, in _forward
        x = self.attn2(self.norm2(x), context=context) + x
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlmodel_ipadapter.py", line 246, in attn_forward_hacked
        out = out + f(self, x, q)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/Users/frank/git/thirdparty/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlmodel_ipadapter.py", line 406, in forward
        ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k, ip_v, attn_mask=None, dropout_p=0.0, is_causal=False)
    RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: c10::Half key.dtype: float and value.dtype: float instead.

---

Additional information

Also occurs in other ip-adapter models, e.g. ip-adapter-plus_sd15 [c817b455]

Seal-Pavel commented 1 year ago

same issue

undeadx1 commented 1 year ago

same issue too. env : m1 mac

Idmon commented 1 year ago

Same here. IP-Adapter been buggy and can't get it to work

Osato28 commented 12 months ago

Same here. M1 Mac 8GB, Sonoma 14.1.1.

Information that might be related: Sonoma has previously caused an fp16-related issue with NeuralNet on PyTorch 2.1.0, but that particular problem was solved by updating to 2.2.0.dev20231012. (Issue AUTOMATIC1111/stable-diffusion-webui#13419)

Attempted solutions: Launching SD with --no-half "fixes" the problem by forcing all fp16 values into fp32, but it also slows down each iteration by 8-12 times (from 2 to 16-20 seconds, in my case). UPD: Tried enabling the "Upcast cross attention layer to float32" option in Settings -> Stable Diffusion. Didn't work.

beltonk commented 11 months ago

Same here. M1 Max

beltonk commented 11 months ago

This works for me:

Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)

to convert ip_k & ip_v from float to c10:Half by adding .half() for each.

Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

huchenlei commented 11 months ago

This works for me:

Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)

to convert ip_k & ip_v from float to c10:Half by adding .half() for each.

Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

Anyone verify this solution on their Mac? I do not have an MacOS machine to verify this patch. I will merge this patch to main branch once it is verified.

Osato28 commented 11 months ago

This works for me: Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False) to convert ip_k & ip_v from float to c10:Half by adding .half() for each. Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

Anyone verify this solution on their Mac? I do not have an MacOS machine to verify this patch. I will merge this patch to main branch once it is verified.

I can't compare the results to an Nvidia machine, so I'm going to post a detailed report with image samples just in case this fix caused some weirdness that I can't detect.

My apologies if this response is a bit long; I'd rather be thorough than miss something that an Nvidia owner would notice.

TL;DR:

1) Tested on txt2img and img2img. Didn't find any issues. 2) Outputs in both modes are highly accurate and reproducible. 3) The slowdown due to IPAdapter seems to be within 15% of the original s/it value.


Testing parameters:

Processor: M1 8GB.

OS: Sonoma 14.1.1.

PyTorch version: 2.2.0.dev20231012

Webui arguments on launch: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate.

Resolutions: 512x512 and 512x768.

IPAdapter settings: ip-adapter_clip -> ip-adapter-plus-face_sd15, Low VRAM, Control Weight 0.7, Steps 0.5-1.0.


Attaching XY grids below to display the results.

Model: Deliberate v2.

Sampler: DPM++ 2M Karras, sampling steps: 20.

Prompt: female nurse, black hair.

Negative prompt: nsfw, disfigured, (deformed), ugly, saturated, doll, cgi, calligraphy, mismatched eyes, poorly drawn, b&w, blurry, missing, ((malformed)), ((out of frame)), model, letters, mangled, old, surreal, ((bad anatomy)), ((deformed legs)), ((deformed arms)).

IPAdapter image:

image (22)

1) 512x512. No issues. Average time per iteration: 1.555 s/it without ControlNet, 1.6 s/it with IPAdapter.

xyz_grid-0001-2734938831

2) 512x768. No issues. Average time per iteration: 2.75 s/it without ControlNet, 2.965 s/it with IPAdapter

xyz_grid-0002-2734938831

3) Reproducibility test: generating from the same seed three times, IPAdapter turned on, to see if outputs will differ from each other. No issues.

xyz_grid-0003-2734938831

4) img2img test (using only one seed, testing for accuracy and reproducibility at the same time). No issues.

xyz_grid-0001-2734938831

beltonk commented 11 months ago

@Osato28 So the fix works for you too, right? Do you spot anything weird in your generations?

Your generations look pretty cool to me. I'm bad in tuning settings for nice outputs...

If the output does work for Apple Silicon, my only concern is about the --upcast-sampling, --no-half settings, etc. I have a feeling they are related to the error. simply typecasting by .half() might break users not using Apple Silicon. I only have a M1 Max, so unable to test for other PC / GPU / CPU...

By the way, My COMMANDLINE_ARGS is:

"--skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --medvram --use-cpu Interrogate --no-half-vae --disable-safe-unpickle --autolaunch",

which I thought is optimized for Apple Silicon

Osato28 commented 11 months ago

@beltonk I didn't spot anything weird and I can't test it on non-Apple Silicon.

Hence the overly detailed test results: I'm hoping that if there is anything weird, it will be caught by someone with a more traditional GPU.

Thank you for posting that fix, by the way. I couldn't make heads or tails of how IPAdapter worked, and I didn't have the courage to blindly typecast values until the error message went away.


Offtopic:

1) Prettiness is not due to prompt engineering but due to the model, Deliberate v2. It's as stable and balanced as models get: it would probably give better results with a shorter negative prompt, I just stopped optimizing that prompt halfway.

2) As for COMMANDLINE_ARGS, I simply kept the most minimal set that prevented crashes and kept performance reasonably high. I didn't optimize it besides that. --medvram does seem to improve performance with heavier ControlNet models, though; added it to my args, thank you.

But I'm afraid that both of those discussions are outside the scope of this issue.

If you wish to initiate testing on several Apple Silicon machines to find an optimal set of COMMANDLINE_ARGS, I think it would be better to start a separate discussion issue in the main AUTOMATIC1111 repo.

axeldelafosse commented 11 months ago

Thank you @beltonk -- your fix worked for me too!

Lichtfabrik commented 11 months ago

Thx @beltonk -- works for me as well!

Osniackal commented 10 months ago

The fix of @beltonk worked for me on m2 mac mini

MrSegundus commented 10 months ago

Worked here! (Mac, M2 / 1111 v 1.7)

alamyrjunior commented 9 months ago

This works for me:

Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)

to convert ip_k & ip_v from float to c10:Half by adding .half() for each.

Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.

which file should I change? Cant find controlmodel_ipadapter.py

xuyang16 commented 3 months ago

Thank you,@huchenlei https://github.com/Mikubill/sd-webui-controlnet/pull/2348