I am training on my custom dataset using transfer learning but getting the size mismatch error because of the different number of classes on my dataset (binary) and the weights (trained on 150 classes). Do I need to unfreeze last layers or is there some other hack to get this to work. Thanks!
[2020-06-08 04:52:04,191 INFO train.py line 248 4485] Running with config:
DATASET:
imgMaxSize: 1000
imgSizes: (300, 375, 450, 525, 600)
list_train: ./data/stock_training.odgt
list_val: ./data/stock_validation.odgt
num_class: 2
padding_constant: 32
random_flip: True
root_dataset: .
segm_downsampling_rate: 4
DIR: ckpt/ade20k-hrnetv2-c1
MODEL:
arch_decoder: c1
arch_encoder: hrnetv2
fc_dim: 720
weights_decoder:
weights_encoder:
TEST:
batch_size: 1
checkpoint: epoch_9.pth
result: ./inference_output
TRAIN:
batch_size_per_gpu: 2
beta1: 0.9
deep_sup_scale: 0.4
disp_iter: 20
epoch_iters: 5000
fix_bn: True
lr_decoder: 1e-06
lr_encoder: 1e-06
lr_pow: 0.9
num_epoch: 35
optim: SGD
seed: 304
start_epoch: 30
weight_decay: 0.0001
workers: 16
VAL:
batch_size: 1
checkpoint: epoch_30.pth
visualize: False
[2020-06-08 04:52:04,191 INFO train.py line 253 4485] Outputing checkpoints to: ckpt/ade20k-hrnetv2-c1
Using weights: ckpt/ade20k-hrnetv2-c1/encoder_epoch_30.pth ckpt/ade20k-hrnetv2-c1/decoder_epoch_30.pth
Loading weights for net_encoder
Loading weights for net_decoder
Traceback (most recent call last):
File "train.py", line 282, in <module>
main(cfg, gpus)
File "train.py", line 158, in main
weights=cfg.MODEL.weights_decoder)
File "/home/ubuntu/autonopia/semantic-segmentation-pytorch/models/models.py", line 166, in build_decoder
torch.load(weights, map_location=lambda storage, loc: storage), strict=False)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for C1:
size mismatch for conv_last.weight: copying a param with shape torch.Size([150, 180, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 180, 1, 1]).
size mismatch for conv_last.bias: copying a param with shape torch.Size([150]) from checkpoint, the shape in current model is torch.Size([2]).```
I am training on my custom dataset using transfer learning but getting the size mismatch error because of the different number of classes on my dataset (binary) and the weights (trained on 150 classes). Do I need to unfreeze last layers or is there some other hack to get this to work. Thanks!