Hxyz-123 / Font-diff

112 stars 16 forks source link

size mismatch in test #7

Closed yyongjae closed 1 year ago

yyongjae commented 1 year ago

When I run sample.py with test_cfg(I modified some config for korean), there is an error about UNet Size mismatch. Is there something should I modify code for Korean? Or Should I run this file after all 800,000 iteration training? I met this error after 200,000 iter.

Traceback (most recent call last):
  File "/home/elicer/Own-My-Geul/Font-diff/sample.py", line 218, in <module>
    main()
  File "/home/elicer/Own-My-Geul/Font-diff/sample.py", line 65, in main
    model.load_state_dict(
  File "/home/elicer/Own-My-Geul/Font-diff/omg/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetWithStyEncoderModel:
        Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.4.0.skip_connection.weight", "input_blocks.4.0.skip_connection.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.10.0.skip_connection.weight", "input_blocks.10.0.skip_connection.bias", "output_blocks.2.1.conv.weight", "output_blocks.2.1.conv.bias", "output_blocks.5.1.conv.weight", "output_blocks.5.1.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias". 
        Unexpected key(s) in state_dict: "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.13.0.in_layers.0.weight", "input_blocks.13.0.in_layers.0.bias", "input_blocks.13.0.in_layers.2.weight", "input_blocks.13.0.in_layers.2.bias", "input_blocks.13.0.emb_layers.1.weight", "input_blocks.13.0.emb_layers.1.bias", "input_blocks.13.0.out_layers.0.weight", "input_blocks.13.0.out_layers.0.bias", "input_blocks.13.0.out_layers.3.weight", "input_blocks.13.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.0.in_layers.0.weight", "input_blocks.14.0.in_layers.0.bias", "input_blocks.14.0.in_layers.2.weight", "input_blocks.14.0.in_layers.2.bias", "input_blocks.14.0.emb_layers.1.weight", "input_blocks.14.0.emb_layers.1.bias", "input_blocks.14.0.out_layers.0.weight", "input_blocks.14.0.out_layers.0.bias", "input_blocks.14.0.out_layers.3.weight", "input_blocks.14.0.out_layers.3.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.15.0.in_layers.0.weight", "input_blocks.15.0.in_layers.0.bias", "input_blocks.15.0.in_layers.2.weight", "input_blocks.15.0.in_layers.2.bias", "input_blocks.15.0.emb_layers.1.weight", "input_blocks.15.0.emb_layers.1.bias", "input_blocks.15.0.out_layers.0.weight", "input_blocks.15.0.out_layers.0.bias", "input_blocks.15.0.out_layers.3.weight", "input_blocks.15.0.out_layers.3.bias", "input_blocks.15.1.norm.weight", "input_blocks.15.1.norm.bias", "input_blocks.15.1.qkv.weight", "input_blocks.15.1.qkv.bias", "input_blocks.15.1.proj_out.weight", "input_blocks.15.1.proj_out.bias", "input_blocks.3.0.in_layers.0.weight", "input_blocks.3.0.in_layers.0.bias", "input_blocks.3.0.in_layers.2.weight", "input_blocks.3.0.in_layers.2.bias", "input_blocks.3.0.emb_layers.1.weight", "input_blocks.3.0.emb_layers.1.bias", "input_blocks.3.0.out_layers.0.weight", "input_blocks.3.0.out_layers.0.bias", "input_blocks.3.0.out_layers.3.weight", "input_blocks.3.0.out_layers.3.bias", "input_blocks.4.0.op.weight", "input_blocks.4.0.op.bias", "input_blocks.5.1.norm.weight", "input_blocks.5.1.norm.bias", "input_blocks.5.1.qkv.weight", "input_blocks.5.1.qkv.bias", "input_blocks.5.1.proj_out.weight", "input_blocks.5.1.proj_out.bias", "input_blocks.5.0.skip_connection.weight", "input_blocks.5.0.skip_connection.bias", "input_blocks.6.1.norm.weight", "input_blocks.6.1.norm.bias", "input_blocks.6.1.qkv.weight", "input_blocks.6.1.qkv.bias", "input_blocks.6.1.proj_out.weight", "input_blocks.6.1.proj_out.bias", "input_blocks.6.0.in_layers.0.weight", "input_blocks.6.0.in_layers.0.bias", "input_blocks.6.0.in_layers.2.weight", "input_blocks.6.0.in_layers.2.bias", "input_blocks.6.0.emb_layers.1.weight", "input_blocks.6.0.emb_layers.1.bias", "input_blocks.6.0.out_layers.0.weight", "input_blocks.6.0.out_layers.0.bias", "input_blocks.6.0.out_layers.3.weight", "input_blocks.6.0.out_layers.3.bias", "input_blocks.7.1.norm.weight", "input_blocks.7.1.norm.bias", "input_blocks.7.1.qkv.weight", "input_blocks.7.1.qkv.bias", "input_blocks.7.1.proj_out.weight", "input_blocks.7.1.proj_out.bias", "input_blocks.8.0.op.weight", "input_blocks.8.0.op.bias", "input_blocks.9.1.norm.weight", "input_blocks.9.1.norm.bias", "input_blocks.9.1.qkv.weight", "input_blocks.9.1.qkv.bias", "input_blocks.9.1.proj_out.weight", "input_blocks.9.1.proj_out.bias", "input_blocks.9.0.in_layers.0.weight", "input_blocks.9.0.in_layers.0.bias", "input_blocks.9.0.in_layers.2.weight", "input_blocks.9.0.in_layers.2.bias", "input_blocks.9.0.emb_layers.1.weight", "input_blocks.9.0.emb_layers.1.bias", "input_blocks.9.0.out_layers.0.weight", "input_blocks.9.0.out_layers.0.bias", "input_blocks.9.0.out_layers.3.weight", "input_blocks.9.0.out_layers.3.bias", "input_blocks.9.0.skip_connection.weight", "input_blocks.9.0.skip_connection.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "output_blocks.12.0.in_layers.0.weight", "output_blocks.12.0.in_layers.0.bias", "output_blocks.12.0.in_layers.2.weight", "output_blocks.12.0.in_layers.2.bias", "output_blocks.12.0.emb_layers.1.weight", "output_blocks.12.0.emb_layers.1.bias", "output_blocks.12.0.out_layers.0.weight", "output_blocks.12.0.out_layers.0.bias", "output_blocks.12.0.out_layers.3.weight", "output_blocks.12.0.out_layers.3.bias", "output_blocks.12.0.skip_connection.weight", "output_blocks.12.0.skip_connection.bias", "output_blocks.13.0.in_layers.0.weight", "output_blocks.13.0.in_layers.0.bias", "output_blocks.13.0.in_layers.2.weight", "output_blocks.13.0.in_layers.2.bias", "output_blocks.13.0.emb_layers.1.weight", "output_blocks.13.0.emb_layers.1.bias", "output_blocks.13.0.out_layers.0.weight", "output_blocks.13.0.out_layers.0.bias", "output_blocks.13.0.out_layers.3.weight", "output_blocks.13.0.out_layers.3.bias", "output_blocks.13.0.skip_connection.weight", "output_blocks.13.0.skip_connection.bias", "output_blocks.14.0.in_layers.0.weight", "output_blocks.14.0.in_layers.0.bias", "output_blocks.14.0.in_layers.2.weight", "output_blocks.14.0.in_layers.2.bias", "output_blocks.14.0.emb_layers.1.weight", "output_blocks.14.0.emb_layers.1.bias", "output_blocks.14.0.out_layers.0.weight", "output_blocks.14.0.out_layers.0.bias", "output_blocks.14.0.out_layers.3.weight", "output_blocks.14.0.out_layers.3.bias", "output_blocks.14.0.skip_connection.weight", "output_blocks.14.0.skip_connection.bias", "output_blocks.15.0.in_layers.0.weight", "output_blocks.15.0.in_layers.0.bias", "output_blocks.15.0.in_layers.2.weight", "output_blocks.15.0.in_layers.2.bias", "output_blocks.15.0.emb_layers.1.weight", "output_blocks.15.0.emb_layers.1.bias", "output_blocks.15.0.out_layers.0.weight", "output_blocks.15.0.out_layers.0.bias", "output_blocks.15.0.out_layers.3.weight", "output_blocks.15.0.out_layers.3.bias", "output_blocks.15.0.skip_connection.weight", "output_blocks.15.0.skip_connection.bias", "output_blocks.0.1.norm.weight", "output_blocks.0.1.norm.bias", "output_blocks.0.1.qkv.weight", "output_blocks.0.1.qkv.bias", "output_blocks.0.1.proj_out.weight", "output_blocks.0.1.proj_out.bias", "output_blocks.1.1.norm.weight", "output_blocks.1.1.norm.bias", "output_blocks.1.1.qkv.weight", "output_blocks.1.1.qkv.bias", "output_blocks.1.1.proj_out.weight", "output_blocks.1.1.proj_out.bias", "output_blocks.2.1.norm.weight", "output_blocks.2.1.norm.bias", "output_blocks.2.1.qkv.weight", "output_blocks.2.1.qkv.bias", "output_blocks.2.1.proj_out.weight", "output_blocks.2.1.proj_out.bias", "output_blocks.3.1.norm.weight", "output_blocks.3.1.norm.bias", "output_blocks.3.1.qkv.weight", "output_blocks.3.1.qkv.bias", "output_blocks.3.1.proj_out.weight", "output_blocks.3.1.proj_out.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.4.1.norm.weight", "output_blocks.4.1.norm.bias", "output_blocks.4.1.qkv.weight", "output_blocks.4.1.qkv.bias", "output_blocks.4.1.proj_out.weight", "output_blocks.4.1.proj_out.bias", "output_blocks.5.1.norm.weight", "output_blocks.5.1.norm.bias", "output_blocks.5.1.qkv.weight", "output_blocks.5.1.qkv.bias", "output_blocks.5.1.proj_out.weight", "output_blocks.5.1.proj_out.bias", "output_blocks.6.1.norm.weight", "output_blocks.6.1.norm.bias", "output_blocks.6.1.qkv.weight", "output_blocks.6.1.qkv.bias", "output_blocks.6.1.proj_out.weight", "output_blocks.6.1.proj_out.bias", "output_blocks.7.1.norm.weight", "output_blocks.7.1.norm.bias", "output_blocks.7.1.qkv.weight", "output_blocks.7.1.qkv.bias", "output_blocks.7.1.proj_out.weight", "output_blocks.7.1.proj_out.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.9.1.norm.weight", "output_blocks.9.1.norm.bias", "output_blocks.9.1.qkv.weight", "output_blocks.9.1.qkv.bias", "output_blocks.9.1.proj_out.weight", "output_blocks.9.1.proj_out.bias", "output_blocks.10.1.norm.weight", "output_blocks.10.1.norm.bias", "output_blocks.10.1.qkv.weight", "output_blocks.10.1.qkv.bias", "output_blocks.10.1.proj_out.weight", "output_blocks.10.1.proj_out.bias", "output_blocks.11.1.norm.weight", "output_blocks.11.1.norm.bias", "output_blocks.11.1.qkv.weight", "output_blocks.11.1.qkv.bias", "output_blocks.11.1.proj_out.weight", "output_blocks.11.1.proj_out.bias", "output_blocks.11.2.conv.weight", "output_blocks.11.2.conv.bias". 
        size mismatch for input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 256, 3, 3]).
        size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
        size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
        size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 384, 3, 3]).
        size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([768, 512]) from checkpoint, the shape in current model is torch.Size([1024, 512]).
        size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([768, 512]) from checkpoint, the shape in current model is torch.Size([1024, 512]).
        size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([896]).
        size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([896]).
        size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 896, 3, 3]).
        size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 896, 1, 1]).
        size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 896, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 896, 3, 3]).
        size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
        size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
        size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 896, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 896, 1, 1]).
        size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([896]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([896]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([384, 896, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 768, 3, 3]).
        size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([384, 896, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 768, 1, 1]).
        size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([384, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 640, 3, 3]).
        size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([384, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 640, 1, 1]).
        size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([640]).
        size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([384, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 640, 3, 3]).
        size mismatch for output_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([768, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for output_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for output_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([384, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 640, 1, 1]).
        size mismatch for output_blocks.6.0.skip_connection.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([384, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
        size mismatch for output_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([768, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for output_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for output_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([384, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
        size mismatch for output_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([256, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 384, 3, 3]).
        size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([256, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 384, 1, 1]).
        size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 384, 3, 3]).
        size mismatch for output_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
        size mismatch for output_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for output_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 384, 1, 1]).
        size mismatch for output_blocks.9.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]).
        size mismatch for output_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
        size mismatch for output_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for output_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.10.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
        size mismatch for output_blocks.10.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]).
        size mismatch for output_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
        size mismatch for output_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for output_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for output_blocks.11.0.skip_connection.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
        size mismatch for output_blocks.11.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).