Size mismatch for ssd_heads when using the pretrained model

QiqLiang commented 1 year ago

Hi, I came up with an issue when training on my dataset which has 102 classes. It seems like a size mismatch problem when using the pretrained model.

Here is my code: PYTHONWARNINGS="ignore" cvnets-train \ --common.config-file config/detection/ssd_coco/coco-ssd-mobilevitv2-1.75.yaml \ --common.results-loc exp/exp1 \ --common.override-kwargs model.detection.pretrained="ckpt/coco-ssd-mobilevitv2-1.75.pt" model.detection.n-classes=103

and this is the error: 2022-12-12 21:09:24 - ERROR - Unable to load pretrained weights from ckpt/coco-ssd-mobilevitv2-1.75.pt. Error: Error(s) in loading state_dict for SingleShotMaskDetector: size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 512, 1, 1]). size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 256, 1, 1]). size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.2.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 256, 1, 1]). size mismatch for ssd_heads.2.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.3.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 128, 1, 1]). size mismatch for ssd_heads.3.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.4.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 128, 1, 1]). size mismatch for ssd_heads.4.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.5.loc_cls_layer.block.conv.weight: copying a param with shape torch.Size([340, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([428, 64, 1, 1]). size mismatch for ssd_heads.5.loc_cls_layer.block.conv.bias: copying a param with shape torch.Size([340]) from checkpoint, the shape in current model is torch.Size([428]).

How to use the pretrained model when i try to train on my own dataset with different num_classes with mscoco?

sacmehta commented 1 year ago

For overriding kwargs, we need keys to have _ instead of -

Change model.detection.n-classes=103 to model.detection.n_classes=103

sacmehta commented 1 year ago

Also, we recommend to use yaml files over command line arguments.

QiqLiang commented 1 year ago

For overriding kwargs, we need keys to have _ instead of -

Change model.detection.n-classes=103 to model.detection.n_classes=103

I used this code: PYTHONWARNINGS="ignore" cvnets-train \ --common.config-file config/detection/ssd_coco/coco-ssd-mobilevitv2-1.75.yaml \ --common.results-loc exp/exp1 \ --common.override-kwargs model.detection.pretrained="ckpt/coco-ssd-mobilevitv2-1.75.pt" model.detection.n_classes=103

but still got the same error.

sacmehta commented 1 year ago

Are these weights pretrained on MS-COCO?

QiqLiang commented 1 year ago

Are these weights pretrained on MS-COCO?

Yes. I downloaded from model zoo(MS-COCO). The link is: https://docs-assets.developer.apple.com/ml-research/models/cvnets-v2/detection/mobilevitv2/coco-ssd-mobilevitv2-1.75.pt

sacmehta commented 1 year ago

In our recent release, we added support for excluding keys in checkpoint. Could you please try it?

You need to pass the name of keys as list using --model.resume-exclude-scopes argument for which pre-trained weights shape is different?

apple / ml-cvnets

Size mismatch for ssd_heads when using the pretrained model #62