mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)

zymox commented 9 months ago

Trying to run the example workflow (with the provided example video + image), I get an error with the sampler:

Error occurred when executing [ComfyUI-3D] Animate Anyone Sampler:

mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)

File "E:\ComfyUI\execution.py", line 155, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\ComfyUI\execution.py", line 85, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\ComfyUI\execution.py", line 78, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\nodes.py", line 152, in animate_anyone
samples = diffuser(
File "C:\Dev\Python3.10\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\main_diffuser.py", line 440, in __call__
latents = self.denoise_loop(
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\main_diffuser.py", line 315, in denoise_loop
self.reference_unet(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\unet_2d_condition.py", line 1197, in forward
sample, res_samples = downsample_block(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\unet_2d_blocks.py", line 657, in forward
hidden_states, ref_feature = attn(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\transformer_2d.py", line 357, in forward
hidden_states = block(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\mutual_self_attention.py", line 241, in hacked_basic_transformer_inner_forward
attn_output = self.attn2(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Dev\Python3.10\lib\site-packages\diffusers\models\attention_processor.py", line 527, in forward
return self.processor(
File "C:\Dev\Python3.10\lib\site-packages\diffusers\models\attention_processor.py", line 1246, in __call__
key = attn.to_k(encoder_hidden_states, *args)
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Dev\Python3.10\lib\site-packages\diffusers\models\lora.py", line 430, in forward
out = super().forward(hidden_states)
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)

No idea how to approach this ... any help?

zymox commented 9 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

nothingness6 commented 8 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

I also faced the same issue, but it doesn't solve my issue. Where should I put it into?

DougPP commented 8 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

I am facing the same error "mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)". I followed your instruction & made sure the correct model was downloaded but still receive the same error message.

Anyone else able to resolve this?

Potts2k8 commented 5 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

Noob here. Where do we even put this?

changer1666 commented 5 months ago

Because you're using SDXL's checkpoint model, but,controlnet uses SD1.5's

MrForExample / ComfyUI-AnimateAnyone-Evolved

mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320) #17