X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.33k stars 176 forks source link

Error(s) in loading state_dict for MplugOwlForConditionalGeneration (video ): #160

Open 2023luckyboy opened 1 year ago

2023luckyboy commented 1 year ago

=============== ======= from transformers import AutoTokenizer from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor import torch pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b-video' model = MplugOwlForConditionalGeneration.from_pretrained( pretrained_ckpt, torch_dtype=torch.bfloat16, cache_dir = './' )

Errors as follows: Traceback (most recent call last): File "/remote-home/share/VideoBenchmark/Video_Benchmark/VLLM-3metrics/mPLUG-Owl/mplug-owl_infer.py", line 9, in model = MplugOwlForConditionalGeneration.from_pretrained( File "/root/anaconda3/envs/mplug_owl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained ) = cls._load_pretrained_model( File "/root/anaconda3/envs/mplug_owl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3173, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for MplugOwlForConditionalGeneration: size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]).

Shame-fight commented 1 year ago

=============== =======

from transformers import AutoTokenizer from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor import torch pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b-video' model = MplugOwlForConditionalGeneration.from_pretrained( pretrained_ckpt, torch_dtype=torch.bfloat16, cache_dir = './' ) Errors as follows: Traceback (most recent call last): File "/remote-home/share/VideoBenchmark/Video_Benchmark/VLLM-3metrics/mPLUG-Owl/mplug-owl_infer.py", line 9, in model = MplugOwlForConditionalGeneration.from_pretrained( File "/root/anaconda3/envs/mplug_owl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained ) = cls._load_pretrained_model( File "/root/anaconda3/envs/mplug_owl/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3173, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for MplugOwlForConditionalGeneration: size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.0.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.1.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.2.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.3.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.4.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w1.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w1.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w2.weight: copying a param with shape torch.Size([1024, 2816]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w3.weight: copying a param with shape torch.Size([2816, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.w3.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.ffn_ln.weight: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for abstractor.encoder.layers.5.crossattention.output.mlp.ffn_ln.bias: copying a param with shape torch.Size([2816]) from checkpoint, the shape in current model is torch.Size([4096]).

Have you solved this problem? I have the same question.

shaswati1 commented 11 months ago

@2023luckyboy and @Shame-fight, change from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration to from mplug_owl_video.modeling_mplug_owl import MplugOwlForConditionalGeneration