atosystem / SpeechCLIP

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
https://atosystem.github.io/blogs/speechclip
BSD 3-Clause "New" or "Revised" License
108 stars 6 forks source link

Encountered a size mismatch problem while attempting to load a large model. #9

Closed marcos452 closed 5 months ago

marcos452 commented 5 months ago

Hello,

Thank you very much for your efforts! The repository and guidelines are concise and very effective.

I am trying to load large model .SpeechCLIP/slt_ckpts/SpeechCLIP/large/coco/parallel/epoch_14-step_33224-val_recall_mean_10_84.0128.ckpt by using example.py(However, it loads base model, there is no error). It occurs following error:

Using cache found in /home/lan/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3 for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt ckpt: /home/lan/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3 WARNING:avssl.module.clip_official:Reduce text embedding to size of 19787 Traceback (most recent call last): File "example.py", line 10, in model = avssl.model.KWClip_GeneralTransformer.load_from_checkpoint(model_fp) File "/data/clusterfs/mld/users/lan/anaconda3/envs/emagepy38/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 156, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/data/clusterfs/mld/users/lan/anaconda3/envs/emagepy38/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 204, in _load_model_state keys = model.load_state_dict(checkpoint["state_dict"], strict=strict) File "/data/clusterfs/mld/users/lan/anaconda3/envs/emagepy38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for KWClip_GeneralTransformer: size mismatch for criterion.eye_mat: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 256]). size mismatch for criterion.neg_eye_mat: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 256]). size mismatch for criterion.eye_mat_fl: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 256]).

Any insights or suggestions you can provide would be greatly appreciated.

Thank you!

atosystem commented 5 months ago

Hi @marcos452 Thanks for using SpeechCLIP. Please refer to this issue https://github.com/atosystem/SpeechCLIP/issues/7#issuecomment-2009757426