RuntimeError: Error(s) in loading state_dict for GPT2Model:

Centos系统中，安装apex和deepspeed等依赖包运行目录为项目根目录，预训练模型，存储根目录：80000/80000/mp_rank_00_model_states.pt

运行： python generate_samples.py --model-parallel-size 2 --num-layers 32 --hidden-size 2560 --load ./80000 --num-attention-heads 32 --seq-length 1024 --max-position-embeddings 1024 --fp16 --cache-dir cache --out-seq-length 512 --temperature 0.9 --top_k 0 --top_p 0 --tokenizer-path bpe_3w_new/ --vocab-size 30000 --input-text example.txt 报错如下： Generate Samples WARNING: No training data specified using world size: 1 and model-parallel size: 1

using dynamic loss scaling /home/troila/anaconda3/envs/test/lib/python3.7/site-packages/torch/cuda/init.py:146: UserWarning: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

initializing model parallel with size 1 initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 building CPM model ... number of parameters on model parallel rank 0: 2597073920 global rank 0 is loading checkpoint ./80000/80000/mp_rank_00_model_states.pt Traceback (most recent call last): File "generate_samples.py", line 384, in main() File "generate_samples.py", line 374, in main model = setup_model(args) File "generate_samples.py", line 345, in setup_model args.iteration = load_checkpoint_model(model, args) File "/home/hanlifei/CPM-Generate/utils.py", line 290, in load_checkpoint_model model.load_state_dict(sd['module']) File "/home/hanlifei/CPM-Generate/model/distributed.py", line 90, in load_state_dict self.module.load_state_dict(state_dict, strict=strict) File "/home/hanlifei/CPM-Generate/fp16/fp16.py", line 71, in load_state_dict self.module.load_state_dict(state_dict, strict=strict) File "/home/troila/anaconda3/envs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for GPT2Model: size mismatch for word_embeddings.weight: copying a param with shape torch.Size([15000, 2560]) from checkpoint, the shape in current model is torch.Size([30000, 2560]). size mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.0.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.1.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.2.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.2.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.2.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.2.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.2.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.2.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.3.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.3.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.3.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.3.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.3.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.3.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.4.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.4.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.4.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.4.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.4.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.4.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.5.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.5.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.5.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.5.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.5.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.5.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.6.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.6.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.6.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.6.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.6.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.6.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.7.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.7.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.7.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.7.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.7.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.7.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.8.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.8.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.8.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.8.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.8.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.8.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.9.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.9.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.9.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.9.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.9.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.9.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.10.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.10.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.10.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.10.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.10.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.10.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.11.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.11.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.11.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.11.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.11.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.11.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.12.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.12.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.12.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.12.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.12.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.12.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.13.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.13.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.13.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.13.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.13.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.13.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.14.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.14.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.14.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.14.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.14.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.14.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.15.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.15.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.15.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.15.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.15.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.15.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.16.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.16.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.16.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.16.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.16.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.16.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.17.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.17.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.17.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.17.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.17.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.17.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.18.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.18.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.18.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.18.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.18.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.18.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.19.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.19.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.19.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.19.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.19.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.19.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.20.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.20.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.20.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.20.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.20.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.20.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.21.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.21.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.21.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.21.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.21.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.21.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.22.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.22.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.22.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.22.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.22.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.22.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.23.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.23.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.23.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.23.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.23.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.23.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.24.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.24.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.24.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.24.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.24.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.24.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.25.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.25.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.25.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.25.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.25.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.25.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.26.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.26.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.26.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.26.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.26.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.26.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.27.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.27.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.27.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.27.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.27.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.27.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.28.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.28.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.28.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.28.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.28.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.28.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.29.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.29.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.29.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.29.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.29.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.29.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.30.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.30.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.30.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.30.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.30.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.30.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]). size mismatch for transformer.layers.31.attention.query_key_value.weight: copying a param with shape torch.Size([3840, 2560]) from checkpoint, the shape in current model is torch.Size([7680, 2560]). size mismatch for transformer.layers.31.attention.query_key_value.bias: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([7680]). size mismatch for transformer.layers.31.attention.dense.weight: copying a param with shape torch.Size([2560, 1280]) from checkpoint, the shape in current model is torch.Size([2560, 2560]). size mismatch for transformer.layers.31.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([10240, 2560]). size mismatch for transformer.layers.31.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for transformer.layers.31.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([2560, 10240]).

TsinghuaAI / CPM-1-Generate

RuntimeError: Error(s) in loading state_dict for GPT2Model: #76