VitDet huge checkpoint says about incompatible shapes & missing layers in the backbone

Instructions To Reproduce the Issue:

Full runnable code or full changes you made:
```
no changes
```
What exact command you run: tools/lazyconfig_train_net.py --config-file projects/ViTDet/configs/COCO/mask_rcnn_vitdet_h_75ep.py "dataloader.train.total_batch_size=1"

Full logs or other relevant observations:

[09/23 17:55:36 fvcore.common.checkpoint]: [Checkpointer] Loading from detectron2://ImageNetPretrained/MAE/mae_pretrain_vit_huge_p14to16.pth ...
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_2.4.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_2.4.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_2.5.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_2.5.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_3.1.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_3.1.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_3.2.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_3.2.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_4.0.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_4.0.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_4.1.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_4.1.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_5.1.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_5.1.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of backbone.simfp_5.2.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of backbone.simfp_5.2.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv1.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv1.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv2.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv2.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv3.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv3.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv4.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.box_head.conv4.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn1.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn1.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn2.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn2.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn3.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn3.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.bias in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn4.norm.bias in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.bias will not be loaded. Please double check and see if this is desired.
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: Shape of norm.weight in checkpoint is torch.Size([1280]), while shape of roi_heads.mask_head.mask_fcn4.norm.weight in model is torch.Size([256]).
WARNING [09/23 17:55:37 d2.checkpoint.c2_model_loading]: norm.weight will not be loaded. Please double check and see if this is desired.
[09/23 17:55:37 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone.net:
| Names in Model        | Names in Checkpoint               | Shapes                 |
|:----------------------|:----------------------------------|:-----------------------|
| blocks.0.attn.proj.*  | blocks.0.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.0.attn.qkv.*   | blocks.0.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.0.mlp.fc1.*    | blocks.0.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.0.mlp.fc2.*    | blocks.0.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.0.norm1.*      | blocks.0.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.0.norm2.*      | blocks.0.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.1.attn.proj.*  | blocks.1.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.1.attn.qkv.*   | blocks.1.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.1.mlp.fc1.*    | blocks.1.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.1.mlp.fc2.*    | blocks.1.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.1.norm1.*      | blocks.1.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.1.norm2.*      | blocks.1.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.10.attn.proj.* | blocks.10.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.10.attn.qkv.*  | blocks.10.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.10.mlp.fc1.*   | blocks.10.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.10.mlp.fc2.*   | blocks.10.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.10.norm1.*     | blocks.10.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.10.norm2.*     | blocks.10.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.11.attn.proj.* | blocks.11.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.11.attn.qkv.*  | blocks.11.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.11.mlp.fc1.*   | blocks.11.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.11.mlp.fc2.*   | blocks.11.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.11.norm1.*     | blocks.11.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.11.norm2.*     | blocks.11.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.12.attn.proj.* | blocks.12.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.12.attn.qkv.*  | blocks.12.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.12.mlp.fc1.*   | blocks.12.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.12.mlp.fc2.*   | blocks.12.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.12.norm1.*     | blocks.12.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.12.norm2.*     | blocks.12.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.13.attn.proj.* | blocks.13.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.13.attn.qkv.*  | blocks.13.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.13.mlp.fc1.*   | blocks.13.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.13.mlp.fc2.*   | blocks.13.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.13.norm1.*     | blocks.13.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.13.norm2.*     | blocks.13.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.14.attn.proj.* | blocks.14.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.14.attn.qkv.*  | blocks.14.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.14.mlp.fc1.*   | blocks.14.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.14.mlp.fc2.*   | blocks.14.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.14.norm1.*     | blocks.14.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.14.norm2.*     | blocks.14.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.15.attn.proj.* | blocks.15.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.15.attn.qkv.*  | blocks.15.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.15.mlp.fc1.*   | blocks.15.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.15.mlp.fc2.*   | blocks.15.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.15.norm1.*     | blocks.15.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.15.norm2.*     | blocks.15.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.16.attn.proj.* | blocks.16.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.16.attn.qkv.*  | blocks.16.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.16.mlp.fc1.*   | blocks.16.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.16.mlp.fc2.*   | blocks.16.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.16.norm1.*     | blocks.16.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.16.norm2.*     | blocks.16.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.17.attn.proj.* | blocks.17.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.17.attn.qkv.*  | blocks.17.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.17.mlp.fc1.*   | blocks.17.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.17.mlp.fc2.*   | blocks.17.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.17.norm1.*     | blocks.17.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.17.norm2.*     | blocks.17.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.18.attn.proj.* | blocks.18.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.18.attn.qkv.*  | blocks.18.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.18.mlp.fc1.*   | blocks.18.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.18.mlp.fc2.*   | blocks.18.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.18.norm1.*     | blocks.18.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.18.norm2.*     | blocks.18.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.19.attn.proj.* | blocks.19.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.19.attn.qkv.*  | blocks.19.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.19.mlp.fc1.*   | blocks.19.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.19.mlp.fc2.*   | blocks.19.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.19.norm1.*     | blocks.19.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.19.norm2.*     | blocks.19.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.2.attn.proj.*  | blocks.2.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.2.attn.qkv.*   | blocks.2.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.2.mlp.fc1.*    | blocks.2.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.2.mlp.fc2.*    | blocks.2.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.2.norm1.*      | blocks.2.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.2.norm2.*      | blocks.2.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.20.attn.proj.* | blocks.20.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.20.attn.qkv.*  | blocks.20.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.20.mlp.fc1.*   | blocks.20.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.20.mlp.fc2.*   | blocks.20.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.20.norm1.*     | blocks.20.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.20.norm2.*     | blocks.20.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.21.attn.proj.* | blocks.21.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.21.attn.qkv.*  | blocks.21.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.21.mlp.fc1.*   | blocks.21.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.21.mlp.fc2.*   | blocks.21.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.21.norm1.*     | blocks.21.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.21.norm2.*     | blocks.21.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.22.attn.proj.* | blocks.22.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.22.attn.qkv.*  | blocks.22.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.22.mlp.fc1.*   | blocks.22.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.22.mlp.fc2.*   | blocks.22.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.22.norm1.*     | blocks.22.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.22.norm2.*     | blocks.22.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.23.attn.proj.* | blocks.23.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.23.attn.qkv.*  | blocks.23.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.23.mlp.fc1.*   | blocks.23.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.23.mlp.fc2.*   | blocks.23.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.23.norm1.*     | blocks.23.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.23.norm2.*     | blocks.23.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.24.attn.proj.* | blocks.24.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.24.attn.qkv.*  | blocks.24.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.24.mlp.fc1.*   | blocks.24.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.24.mlp.fc2.*   | blocks.24.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.24.norm1.*     | blocks.24.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.24.norm2.*     | blocks.24.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.25.attn.proj.* | blocks.25.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.25.attn.qkv.*  | blocks.25.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.25.mlp.fc1.*   | blocks.25.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.25.mlp.fc2.*   | blocks.25.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.25.norm1.*     | blocks.25.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.25.norm2.*     | blocks.25.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.26.attn.proj.* | blocks.26.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.26.attn.qkv.*  | blocks.26.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.26.mlp.fc1.*   | blocks.26.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.26.mlp.fc2.*   | blocks.26.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.26.norm1.*     | blocks.26.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.26.norm2.*     | blocks.26.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.27.attn.proj.* | blocks.27.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.27.attn.qkv.*  | blocks.27.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.27.mlp.fc1.*   | blocks.27.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.27.mlp.fc2.*   | blocks.27.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.27.norm1.*     | blocks.27.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.27.norm2.*     | blocks.27.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.28.attn.proj.* | blocks.28.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.28.attn.qkv.*  | blocks.28.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.28.mlp.fc1.*   | blocks.28.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.28.mlp.fc2.*   | blocks.28.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.28.norm1.*     | blocks.28.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.28.norm2.*     | blocks.28.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.29.attn.proj.* | blocks.29.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.29.attn.qkv.*  | blocks.29.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.29.mlp.fc1.*   | blocks.29.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.29.mlp.fc2.*   | blocks.29.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.29.norm1.*     | blocks.29.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.29.norm2.*     | blocks.29.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.3.attn.proj.*  | blocks.3.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.3.attn.qkv.*   | blocks.3.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.3.mlp.fc1.*    | blocks.3.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.3.mlp.fc2.*    | blocks.3.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.3.norm1.*      | blocks.3.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.3.norm2.*      | blocks.3.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.30.attn.proj.* | blocks.30.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.30.attn.qkv.*  | blocks.30.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.30.mlp.fc1.*   | blocks.30.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.30.mlp.fc2.*   | blocks.30.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.30.norm1.*     | blocks.30.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.30.norm2.*     | blocks.30.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.31.attn.proj.* | blocks.31.attn.proj.{bias,weight} | (1280,) (1280,1280)    |
| blocks.31.attn.qkv.*  | blocks.31.attn.qkv.{bias,weight}  | (3840,) (3840,1280)    |
| blocks.31.mlp.fc1.*   | blocks.31.mlp.fc1.{bias,weight}   | (5120,) (5120,1280)    |
| blocks.31.mlp.fc2.*   | blocks.31.mlp.fc2.{bias,weight}   | (1280,) (1280,5120)    |
| blocks.31.norm1.*     | blocks.31.norm1.{bias,weight}     | (1280,) (1280,)        |
| blocks.31.norm2.*     | blocks.31.norm2.{bias,weight}     | (1280,) (1280,)        |
| blocks.4.attn.proj.*  | blocks.4.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.4.attn.qkv.*   | blocks.4.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.4.mlp.fc1.*    | blocks.4.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.4.mlp.fc2.*    | blocks.4.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.4.norm1.*      | blocks.4.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.4.norm2.*      | blocks.4.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.5.attn.proj.*  | blocks.5.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.5.attn.qkv.*   | blocks.5.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.5.mlp.fc1.*    | blocks.5.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.5.mlp.fc2.*    | blocks.5.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.5.norm1.*      | blocks.5.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.5.norm2.*      | blocks.5.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.6.attn.proj.*  | blocks.6.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.6.attn.qkv.*   | blocks.6.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.6.mlp.fc1.*    | blocks.6.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.6.mlp.fc2.*    | blocks.6.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.6.norm1.*      | blocks.6.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.6.norm2.*      | blocks.6.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.7.attn.proj.*  | blocks.7.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.7.attn.qkv.*   | blocks.7.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.7.mlp.fc1.*    | blocks.7.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.7.mlp.fc2.*    | blocks.7.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.7.norm1.*      | blocks.7.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.7.norm2.*      | blocks.7.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.8.attn.proj.*  | blocks.8.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.8.attn.qkv.*   | blocks.8.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.8.mlp.fc1.*    | blocks.8.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.8.mlp.fc2.*    | blocks.8.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.8.norm1.*      | blocks.8.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.8.norm2.*      | blocks.8.norm2.{bias,weight}      | (1280,) (1280,)        |
| blocks.9.attn.proj.*  | blocks.9.attn.proj.{bias,weight}  | (1280,) (1280,1280)    |
| blocks.9.attn.qkv.*   | blocks.9.attn.qkv.{bias,weight}   | (3840,) (3840,1280)    |
| blocks.9.mlp.fc1.*    | blocks.9.mlp.fc1.{bias,weight}    | (5120,) (5120,1280)    |
| blocks.9.mlp.fc2.*    | blocks.9.mlp.fc2.{bias,weight}    | (1280,) (1280,5120)    |
| blocks.9.norm1.*      | blocks.9.norm1.{bias,weight}      | (1280,) (1280,)        |
| blocks.9.norm2.*      | blocks.9.norm2.{bias,weight}      | (1280,) (1280,)        |
| patch_embed.proj.*    | patch_embed.proj.{bias,weight}    | (1280,) (1280,3,16,16) |
| pos_embed             | pos_embed                         | (1, 197, 1280)         |
WARNING [09/23 17:55:37 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
backbone.net.blocks.0.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.1.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.10.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.11.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.12.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.13.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.14.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.15.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.16.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.17.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.18.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.19.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.2.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.20.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.21.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.22.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.23.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.24.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.25.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.26.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.27.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.28.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.29.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.3.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.30.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.31.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.4.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.5.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.6.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.7.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.8.attn.{rel_pos_h, rel_pos_w}
backbone.net.blocks.9.attn.{rel_pos_h, rel_pos_w}
backbone.simfp_2.0.{bias, weight}
backbone.simfp_2.1.{bias, weight}
backbone.simfp_2.3.{bias, weight}
backbone.simfp_2.4.norm.{bias, weight}
backbone.simfp_2.4.weight
backbone.simfp_2.5.norm.{bias, weight}
backbone.simfp_2.5.weight
backbone.simfp_3.0.{bias, weight}
backbone.simfp_3.1.norm.{bias, weight}
backbone.simfp_3.1.weight
backbone.simfp_3.2.norm.{bias, weight}
backbone.simfp_3.2.weight
backbone.simfp_4.0.norm.{bias, weight}
backbone.simfp_4.0.weight
backbone.simfp_4.1.norm.{bias, weight}
backbone.simfp_4.1.weight
backbone.simfp_5.1.norm.{bias, weight}
backbone.simfp_5.1.weight
backbone.simfp_5.2.norm.{bias, weight}
backbone.simfp_5.2.weight
proposal_generator.rpn_head.anchor_deltas.{bias, weight}
proposal_generator.rpn_head.conv.conv0.{bias, weight}
proposal_generator.rpn_head.conv.conv1.{bias, weight}
proposal_generator.rpn_head.objectness_logits.{bias, weight}
roi_heads.box_head.conv1.norm.{bias, weight}
roi_heads.box_head.conv1.weight
roi_heads.box_head.conv2.norm.{bias, weight}
roi_heads.box_head.conv2.weight
roi_heads.box_head.conv3.norm.{bias, weight}
roi_heads.box_head.conv3.weight
roi_heads.box_head.conv4.norm.{bias, weight}
roi_heads.box_head.conv4.weight
roi_heads.box_head.fc1.{bias, weight}
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}
roi_heads.mask_head.deconv.{bias, weight}
roi_heads.mask_head.mask_fcn1.norm.{bias, weight}
roi_heads.mask_head.mask_fcn1.weight
roi_heads.mask_head.mask_fcn2.norm.{bias, weight}
roi_heads.mask_head.mask_fcn2.weight
roi_heads.mask_head.mask_fcn3.norm.{bias, weight}
roi_heads.mask_head.mask_fcn3.weight
roi_heads.mask_head.mask_fcn4.norm.{bias, weight}
roi_heads.mask_head.mask_fcn4.weight
roi_heads.mask_head.predictor.{bias, weight}
WARNING [09/23 17:55:37 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
cls_token
norm.{bias, weight}
checkpointer resumed

Expected behavior:

The logs above are about loading the ImageNet pretrained on MAE checkpoint into a VitDet. The messages about incompatible shapes & missing weights in the backbone are unexpected and lead me to a belief this is a wrong checkpoint for the model. I think it boils down ito missing weights in patterns:

- backbone.net.blocks.*.attn.rel_pos*
- backbone.simfp_*.*

In case those were were ignored on purpose when exporting checkpoint - I think it would be best to specify the expected missing weights (as e.g. the rpn and roi_heads are not expected to be in this checkpoint). If not - maybe good idea to add a print before loading the checkpoint about the expected output, or a comment in configuration file in places like https://github.com/facebookresearch/detectron2/blob/main/projects/ViTDet/configs/COCO/mask_rcnn_vitdet_h_75ep.py#L12

Environment:

Paste the output of the following command:

wget -nc -nv https://github.com/facebookresearch/detectron2/raw/main/detectron2/utils/collect_env.py && python collect_env.py

----------------------  --------------------------------------------------------------------------
sys.platform            linux
Python                  3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]
numpy                   1.23.3
detectron2              0.6 @/home/appuser/focal-detectron2/detectron2
Compiler                GCC 9.4
CUDA compiler           CUDA 11.1
detectron2 arch flags   3.5, 3.7, 5.0, 5.2, 5.3, 6.0, 6.1, 7.0, 7.5
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.10.0+cu111 @/home/appuser/.local/lib/python3.8/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   NVIDIA A100-SXM4-40GB (arch=8.0)
Driver version          510.47.03
CUDA_HOME               /usr/local/cuda
TORCH_CUDA_ARCH_LIST    Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing
Pillow                  8.1.0
torchvision             0.11.1+cu111 @/home/appuser/.local/lib/python3.8/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5
iopath                  0.1.9
cv2                     4.6.0
----------------------  --------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

facebookresearch / detectron2