cubiq / Diffusers_IPAdapter

implementation of the IPAdapter models for HF Diffusers
MIT License
160 stars 6 forks source link

KeyError: 'image_proj' with SDXL models #1

Closed Vargol closed 10 months ago

Vargol commented 10 months ago

Hi which IP Adapter models are we supposed to use for SDXL with this ?

I've tried all three SDXL models at https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models and they've all failed with...

[/content/Diffusers_IPAdapter/ip_adapter/ip_adapter.py](https://cn6dw4sgihk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231019-090936_RC00_574892596#) in __init__(self, pipe, ipadapter_ckpt_path, image_encoder_path, device, dtype, resample)
     35 
     36         # detect features
---> 37         self.is_plus = "latents" in ipadapter_model["image_proj"]
     38         self.output_cross_attention_dim = ipadapter_model["ip_adapter"]["1.to_k_ip.weight"].shape[1]
     39         self.is_sdxl = self.output_cross_attention_dim == 2048

KeyError: 'image_proj'
cubiq commented 10 months ago

try to redownload them, the files might be corrupted.

what workflow are you using?

Vargol commented 10 months ago

It's a Diffusers script in colab.

https://colab.research.google.com/drive/17iZwTmnsIh6foIUXWzQpuCTYdHTxXZNp?usp=sharing

Note its being adapted from the a script that runs the Tencent version, anything below the call to IPAdapter is still old code.

cubiq commented 10 months ago

does it work with sd1.5 PLUS models?

also remember that vit-h models require the sd1.5 image encoder

Vargol commented 10 months ago

Same error with https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter_sd15.bin


Keyword arguments {'add_watermarker': False} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...: 100%
7/7 [00:01<00:00, 6.05it/s]
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-5-f7bd81792156>](https://cn6dw4sgihk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231019-090936_RC00_574892596#) in <cell line: 48>()
     46 pipe.enable_vae_slicing()
     47 
---> 48 ip_model = IPAdapter(pipe, image_encoder_path, ip_ckpt, device)

[/content/Diffusers_IPAdapter/ip_adapter/ip_adapter.py](https://cn6dw4sgihk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20231019-090936_RC00_574892596#) in __init__(self, pipe, ipadapter_ckpt_path, image_encoder_path, device, dtype, resample)
     35 
     36         # detect features
---> 37         self.is_plus = "latents" in ipadapter_model["image_proj"]
     38         self.output_cross_attention_dim = ipadapter_model["ip_adapter"]["1.to_k_ip.weight"].shape[1]
     39         self.is_sdxl = self.output_cross_attention_dim == 2048

KeyError: 'image_proj
'```
Vargol commented 10 months ago

Hmmm, okay I think I've spotted my mistake you've got the parameters for the models the other way around compared to Tencent code.

Vargol commented 10 months ago

That would have done the trick I think but I run out of System Ram (not VRAM) before it finished loading the models

Vargol commented 10 months ago

Okay, I bit of adjust in my script and I've got it working now, thank you for your time.