While running with medium model, missing some keys in state_dict

`psoni@blr2-lnxwk-071:~/DialoGPT$ sudo python3 interact.py --model_name_or_path ./models/medium --load_checkpoint ./models/medium/medium_ft.pkl --top_k 0 Found existing ./models folder, skip creating a new one! 03/06/2020 19:51:21 - INFO - main - Downloading models... 03/06/2020 19:51:21 - INFO - demo_utils - ./models/medium/config.json exists, return! 03/06/2020 19:51:21 - INFO - demo_utils - ./models/medium/vocab.json exists, return! 03/06/2020 19:51:21 - INFO - demo_utils - ./models/medium/merges.txt exists, return! 03/06/2020 19:51:21 - INFO - demo_utils - ./models/medium/pytorch_model.bin exists, return! 03/06/2020 19:51:21 - INFO - demo_utils - ./models/medium/medium_ft.pkl exists, return! 03/06/2020 19:51:21 - INFO - main - Done!

03/06/2020 19:51:21 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading vocabulary file ./models/medium/vocab.json 03/06/2020 19:51:21 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading merges file ./models/medium/merges.txt 03/06/2020 19:51:22 - INFO - gpt2_training.train_utils - loading finetuned model from ./models/medium/medium_ft.pkl Traceback (most recent call last): File "interact.py", line 203, in run_model() File "interact.py", line 147, in run_model model = load_model(GPT2LMHeadModel(config), args.load_checkpoint, args, verbose=True) File "/home/psoni/DialoGPT/gpt2_training/train_utils.py", line 39, in load_model start_model.load_state_dict(model_state_dict) File "/home/psoni/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: Missing key(s) in state_dict: "transformer.h.12.ln_1.weight", "transformer.h.12.ln_1.bias", "transformer.h.12.attn.bias", "transformer.h.12.attn.c_attn.weight", "transformer.h.12.attn.c_attn.bias", "transformer.h.12.attn.c_proj.weight", "transformer.h.12.attn.c_proj.bias", "transformer.h.12.ln_2.weight", "transformer.h.12.ln_2.bias", "transformer.h.12.mlp.c_fc.weight", "transformer.h.12.mlp.c_fc.bias", "transformer.h.12.mlp.c_proj.weight", "transformer.h.12.mlp.c_proj.bias", "transformer.h.13.ln_1.weight", "transformer.h.13.ln_1.bias", "transformer.h.13.attn.bias", "transformer.h.13.attn.c_attn.weight", "transformer.h.13.attn.c_attn.bias", "transformer.h.13.attn.c_proj.weight", "transformer.h.13.attn.c_proj.bias", "transformer.h.13.ln_2.weight", "transformer.h.13.ln_2.bias", "transformer.h.13.mlp.c_fc.weight", "transformer.h.13.mlp.c_fc.bias", "transformer.h.13.mlp.c_proj.weight", "transformer.h.13.mlp.c_proj.bias", "transformer.h.14.ln_1.weight", "transformer.h.14.ln_1.bias", "transformer.h.14.attn.bias", "transformer.h.14.attn.c_attn.weight", "transformer.h.14.attn.c_attn.bias", "transformer.h.14.attn.c_proj.weight", "transformer.h.14.attn.c_proj.bias", "transformer.h.14.ln_2.weight", "transformer.h.14.ln_2.bias", "transformer.h.14.mlp.c_fc.weight", "transformer.h.14.mlp.c_fc.bias", "transformer.h.14.mlp.c_proj.weight", "transformer.h.14.mlp.c_proj.bias", "transformer.h.15.ln_1.weight", "transformer.h.15.ln_1.bias", "transformer.h.15.attn.bias", "transformer.h.15.attn.c_attn.weight", "transformer.h.15.attn.c_attn.bias", "transformer.h.15.attn.c_proj.weight", "transformer.h.15.attn.c_proj.bias", "transformer.h.15.ln_2.weight", "transformer.h.15.ln_2.bias", "transformer.h.15.mlp.c_fc.weight", "transformer.h.15.mlp.c_fc.bias", "transformer.h.15.mlp.c_proj.weight", "transformer.h.15.mlp.c_proj.bias", "transformer.h.16.ln_1.weight", "transformer.h.16.ln_1.bias", "transformer.h.16.attn.bias", "transformer.h.16.attn.c_attn.weight", "transformer.h.16.attn.c_attn.bias", "transformer.h.16.attn.c_proj.weight", "transformer.h.16.attn.c_proj.bias", "transformer.h.16.ln_2.weight", "transformer.h.16.ln_2.bias", "transformer.h.16.mlp.c_fc.weight", "transformer.h.16.mlp.c_fc.bias", "transformer.h.16.mlp.c_proj.weight", "transformer.h.16.mlp.c_proj.bias", "transformer.h.17.ln_1.weight", "transformer.h.17.ln_1.bias", "transformer.h.17.attn.bias", "transformer.h.17.attn.c_attn.weight", "transformer.h.17.attn.c_attn.bias", "transformer.h.17.attn.c_proj.weight", "transformer.h.17.attn.c_proj.bias", "transformer.h.17.ln_2.weight", "transformer.h.17.ln_2.bias", "transformer.h.17.mlp.c_fc.weight", "transformer.h.17.mlp.c_fc.bias", "transformer.h.17.mlp.c_proj.weight", "transformer.h.17.mlp.c_proj.bias", "transformer.h.18.ln_1.weight", "transformer.h.18.ln_1.bias", "transformer.h.18.attn.bias", "transformer.h.18.attn.c_attn.weight", "transformer.h.18.attn.c_attn.bias", "transformer.h.18.attn.c_proj.weight", "transformer.h.18.attn.c_proj.bias", "transformer.h.18.ln_2.weight", "transformer.h.18.ln_2.bias", "transformer.h.18.mlp.c_fc.weight", "transformer.h.18.mlp.c_fc.bias", "transformer.h.18.mlp.c_proj.weight", "transformer.h.18.mlp.c_proj.bias", "transformer.h.19.ln_1.weight", "transformer.h.19.ln_1.bias", "transformer.h.19.attn.bias", "transformer.h.19.attn.c_attn.weight", "transformer.h.19.attn.c_attn.bias", "transformer.h.19.attn.c_proj.weight", "transformer.h.19.attn.c_proj.bias", "transformer.h.19.ln_2.weight", "transformer.h.19.ln_2.bias", "transformer.h.19.mlp.c_fc.weight", "transformer.h.19.mlp.c_fc.bias", "transformer.h.19.mlp.c_proj.weight", "transformer.h.19.mlp.c_proj.bias", "transformer.h.20.ln_1.weight", "transformer.h.20.ln_1.bias", "transformer.h.20.attn.bias", "transformer.h.20.attn.c_attn.weight", "transformer.h.20.attn.c_attn.bias", "transformer.h.20.attn.c_proj.weight", "transformer.h.20.attn.c_proj.bias", "transformer.h.20.ln_2.weight", "transformer.h.20.ln_2.bias", "transformer.h.20.mlp.c_fc.weight", "transformer.h.20.mlp.c_fc.bias", "transformer.h.20.mlp.c_proj.weight", "transformer.h.20.mlp.c_proj.bias", "transformer.h.21.ln_1.weight", "transformer.h.21.ln_1.bias", "transformer.h.21.attn.bias", "transformer.h.21.attn.c_attn.weight", "transformer.h.21.attn.c_attn.bias", "transformer.h.21.attn.c_proj.weight", "transformer.h.21.attn.c_proj.bias", "transformer.h.21.ln_2.weight", "transformer.h.21.ln_2.bias", "transformer.h.21.mlp.c_fc.weight", "transformer.h.21.mlp.c_fc.bias", "transformer.h.21.mlp.c_proj.weight", "transformer.h.21.mlp.c_proj.bias", "transformer.h.22.ln_1.weight", "transformer.h.22.ln_1.bias", "transformer.h.22.attn.bias", "transformer.h.22.attn.c_attn.weight", "transformer.h.22.attn.c_attn.bias", "transformer.h.22.attn.c_proj.weight", "transformer.h.22.attn.c_proj.bias", "transformer.h.22.ln_2.weight", "transformer.h.22.ln_2.bias", "transformer.h.22.mlp.c_fc.weight", "transformer.h.22.mlp.c_fc.bias", "transformer.h.22.mlp.c_proj.weight", "transformer.h.22.mlp.c_proj.bias", "transformer.h.23.ln_1.weight", "transformer.h.23.ln_1.bias", "transformer.h.23.attn.bias", "transformer.h.23.attn.c_attn.weight", "transformer.h.23.attn.c_attn.bias", "transformer.h.23.attn.c_proj.weight", "transformer.h.23.attn.c_proj.bias", "transformer.h.23.ln_2.weight", "transformer.h.23.ln_2.bias", "transformer.h.23.mlp.c_fc.weight", "transformer.h.23.mlp.c_fc.bias", "transformer.h.23.mlp.c_proj.weight", "transformer.h.23.mlp.c_proj.bias". size mismatch for transformer.wte.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([50257, 1024]). size mismatch for transformer.wpe.weight: copying a param with shape torch.Size([1024, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.0.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.0.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.0.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.0.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.0.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.0.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.0.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.0.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.0.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.1.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.1.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.1.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.1.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.1.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.1.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.1.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.1.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.1.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.1.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.2.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.2.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.2.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.2.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.2.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.2.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.2.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.2.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.2.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.2.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.2.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.2.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.3.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.3.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.3.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.3.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.3.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.3.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.3.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.3.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.3.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.3.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.3.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.3.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.4.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.4.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.4.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.4.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.4.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.4.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.4.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.4.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.4.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.4.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.4.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.4.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.5.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.5.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.5.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.5.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.5.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.5.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.5.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.5.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.5.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.5.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.5.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.5.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.6.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.6.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.6.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.6.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.6.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.6.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.6.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.6.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.6.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.6.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.6.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.6.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.7.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.7.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.7.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.7.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.7.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.7.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.7.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.7.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.7.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.7.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.7.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.7.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.8.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.8.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.8.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.8.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.8.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.8.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.8.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.8.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.8.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.8.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.8.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.8.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.9.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.9.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.9.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.9.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.9.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.9.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.9.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.9.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.9.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.9.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.9.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.9.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.10.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.10.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.10.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.10.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.10.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.10.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.10.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.10.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.10.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.10.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.10.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.10.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.11.ln_1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.11.ln_1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.11.attn.c_attn.weight: copying a param with shape torch.Size([768, 2304]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for transformer.h.11.attn.c_attn.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for transformer.h.11.attn.c_proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for transformer.h.11.attn.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.11.ln_2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.11.ln_2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.h.11.mlp.c_fc.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for transformer.h.11.mlp.c_fc.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for transformer.h.11.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for transformer.h.11.mlp.c_proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for transformer.ln_f.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for lm_head.decoder.weight: copying a param with shape torch.Size([50257, 768]) from checkpoint, the shape in current model is torch.Size([50257, 1024]). psoni@blr2-lnxwk-071:~/DialoGPT$ `

andreamad8 / DialoGPT2-Interact

While running with medium model, missing some keys in state_dict #1