MrForExample / ComfyUI-AnimateAnyone-Evolved

Improved AnimateAnyone implementation that allows you to use the opse image sequence and reference image to generate stylized video
MIT License
499 stars 42 forks source link

mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320) #17

Closed zymox closed 9 months ago

zymox commented 9 months ago

Trying to run the example workflow (with the provided example video + image), I get an error with the sampler:

Error occurred when executing [ComfyUI-3D] Animate Anyone Sampler:

mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)

File "E:\ComfyUI\execution.py", line 155, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\ComfyUI\execution.py", line 85, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\ComfyUI\execution.py", line 78, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\nodes.py", line 152, in animate_anyone
samples = diffuser(
File "C:\Dev\Python3.10\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\main_diffuser.py", line 440, in __call__
latents = self.denoise_loop(
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\main_diffuser.py", line 315, in denoise_loop
self.reference_unet(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\unet_2d_condition.py", line 1197, in forward
sample, res_samples = downsample_block(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\unet_2d_blocks.py", line 657, in forward
hidden_states, ref_feature = attn(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\transformer_2d.py", line 357, in forward
hidden_states = block(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI\custom_nodes\ComfyUI-AnimateAnyone-Evolved\src\models\mutual_self_attention.py", line 241, in hacked_basic_transformer_inner_forward
attn_output = self.attn2(
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Dev\Python3.10\lib\site-packages\diffusers\models\attention_processor.py", line 527, in forward
return self.processor(
File "C:\Dev\Python3.10\lib\site-packages\diffusers\models\attention_processor.py", line 1246, in __call__
key = attn.to_k(encoder_hidden_states, *args)
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Dev\Python3.10\lib\site-packages\diffusers\models\lora.py", line 430, in forward
out = super().forward(hidden_states)
File "C:\Dev\Python3.10\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)

No idea how to approach this ... any help?

zymox commented 9 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

nothingness6 commented 8 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

I also faced the same issue, but it doesn't solve my issue. Where should I put it into?

DougPP commented 8 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

I am facing the same error "mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)". I followed your instruction & made sure the correct model was downloaded but still receive the same error message.

Anyone else able to resolve this?

Potts2k8 commented 5 months ago

Figured it out.

I was using a different CLIP Vision model, changed it to the one mentioned in the install instructions (https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) and it works.

Noob here. Where do we even put this?

changer1666 commented 5 months ago

Because you're using SDXL's checkpoint model, but,controlnet uses SD1.5's