apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.76k stars 225 forks source link

Segmentation model conversion size mismatch #86

Closed darwinharianto closed 1 year ago

darwinharianto commented 1 year ago

Training segmentation model using width not 1 causes mismatch problem

  1. Train using mobilevit_v2 model with width multiplier settings 0.5
  2. Convert trained model to mlmodel

this causes size mismatch on classification

...
        size mismatch for layer_5.1.conv_proj.block.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for layer_5.1.conv_proj.block.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for layer_5.1.conv_proj.block.norm.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for layer_5.1.conv_proj.block.norm.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for classifier.1.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 256]).. Exiting!!!

if I changed the width multiplier settings to 1, it throws another error

...
        size mismatch for seg_head.psp_layer.psp_branches.3.1.block.norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for seg_head.psp_layer.psp_branches.3.1.block.norm.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for seg_head.psp_layer.psp_branches.3.1.block.norm.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for seg_head.psp_layer.fusion.0.block.conv.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).. Exiting!!!

why is trained model has 1000,512 shapes? instead of 1000,256?

darwinharianto commented 1 year ago

I found the problem

I was overwriting the model.classification.pretrained using command line, which is fine for training session but for conversion, I have to write it directly on the config file