CSAILVision / semantic-segmentation-pytorch

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
http://sceneparsing.csail.mit.edu/
BSD 3-Clause "New" or "Revised" License
4.96k stars 1.1k forks source link

Size mismatch error when loading weights from pretrained weights (hrnetv2-c1) #230

Closed adityashrm21 closed 4 years ago

adityashrm21 commented 4 years ago

I am training on my custom dataset using transfer learning but getting the size mismatch error because of the different number of classes on my dataset (binary) and the weights (trained on 150 classes). Do I need to unfreeze last layers or is there some other hack to get this to work. Thanks!


[2020-06-08 04:52:04,191 INFO train.py line 248 4485] Running with config:
DATASET:
  imgMaxSize: 1000
  imgSizes: (300, 375, 450, 525, 600)
  list_train: ./data/stock_training.odgt
  list_val: ./data/stock_validation.odgt
  num_class: 2
  padding_constant: 32
  random_flip: True
  root_dataset: .
  segm_downsampling_rate: 4
DIR: ckpt/ade20k-hrnetv2-c1
MODEL:
  arch_decoder: c1
  arch_encoder: hrnetv2
  fc_dim: 720
  weights_decoder: 
  weights_encoder: 
TEST:
  batch_size: 1
  checkpoint: epoch_9.pth
  result: ./inference_output
TRAIN:
  batch_size_per_gpu: 2
  beta1: 0.9
  deep_sup_scale: 0.4
  disp_iter: 20
  epoch_iters: 5000
  fix_bn: True
  lr_decoder: 1e-06
  lr_encoder: 1e-06
  lr_pow: 0.9
  num_epoch: 35
  optim: SGD
  seed: 304
  start_epoch: 30
  weight_decay: 0.0001
  workers: 16
VAL:
  batch_size: 1
  checkpoint: epoch_30.pth
  visualize: False
[2020-06-08 04:52:04,191 INFO train.py line 253 4485] Outputing checkpoints to: ckpt/ade20k-hrnetv2-c1
Using weights: ckpt/ade20k-hrnetv2-c1/encoder_epoch_30.pth ckpt/ade20k-hrnetv2-c1/decoder_epoch_30.pth
Loading weights for net_encoder
Loading weights for net_decoder
Traceback (most recent call last):
  File "train.py", line 282, in <module>
    main(cfg, gpus)
  File "train.py", line 158, in main
    weights=cfg.MODEL.weights_decoder)
  File "/home/ubuntu/autonopia/semantic-segmentation-pytorch/models/models.py", line 166, in build_decoder
    torch.load(weights, map_location=lambda storage, loc: storage), strict=False)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for C1:
        size mismatch for conv_last.weight: copying a param with shape torch.Size([150, 180, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 180, 1, 1]).
        size mismatch for conv_last.bias: copying a param with shape torch.Size([150]) from checkpoint, the shape in current model is torch.Size([2]).```
adityashrm21 commented 4 years ago

Just had to keep the num_class = 150 in the config and it continued training.