CLIPVisionModelWithProjection Shape Size Error

RobeSantoro commented 2 months ago

Unfortunately, running any of the example workflows I get the following error:

Error occurred when executing AniPortrait_Pose_Gen_Video:

Error(s) in loading state_dict for CLIPVisionModelWithProjection:
size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([1024, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([257, 1024]) from checkpoint, the shape in current model is torch.Size([50, 768]).
size mismatch for vision_model.pre_layrnorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for vision_model.pre_layrnorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).

[...]

You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

File "E:\COMFY\ComfyUI-robe\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\COMFY\ComfyUI-robe\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\COMFY\ComfyUI-robe\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\COMFY\ComfyUI-robe\custom_nodes\ComfyUI_Aniportrait\nodes.py", line 169, in pose_generate_video
image_enc = CLIPVisionModelWithProjection.from_pretrained(image_encoder_path).to(dtype=weight_dtype, device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxx\.conda\envs\comfy\Lib\site-packages\transformers\modeling_utils.py", line 3677, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxx\.conda\envs\comfy\Lib\site-packages\transformers\modeling_utils.py", line 4155, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")

Is this related to my torch or transformers version? I'm running transformers 4.40.2 and torch 2.3.0+cu118

Can you please help me to fix it?

RobeSantoro commented 2 months ago

I can also add the output before the error:

got prompt
[rgthree] Using rgthree's optimized recursive execution.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
loaded temporal unet's pretrained weights from E:\COMFY\ComfyUI-robe\custom_nodes\ComfyUI_Aniportrait\pretrained_model\stable-diffusion-v1-5\unet ...
Load motion module params from E:\COMFY\ComfyUI-robe\custom_nodes\ComfyUI_Aniportrait\pretrained_model\motion_module.pth
Loaded 453.20928M-parameter motion module

and that I'm running diffusers 0.26.2

frankchieng commented 2 months ago

make sure the pretrained_model file is completed otherwise u'd better re-download these models,btw,pay attention to the reference image and video size, they should be all the square shape

frankchieng / ComfyUI_Aniportrait

CLIPVisionModelWithProjection Shape Size Error #6