RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
size mismatch for module.fpn.fpn_inner3.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for module.fpn.fpn_inner4.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
size mismatch for module.fpn.fpn_inner3.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for module.fpn.fpn_inner4.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
I have downloaded both the sim10k and cityscapes datasets and tries to repeat the training process.
But when I use the following commond
python3 -m torch.distributed.launch --nproc_per_node=4 --master_port=$((RANDOM + 10000)) tools/train_net_da.py --config-file ./configs/da_ga_sim10k_VGG_16_FPN_4x.yaml
I have this error:
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: size mismatch for module.fpn.fpn_inner3.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]). size mismatch for module.fpn.fpn_inner4.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for DistributedDataParallel: size mismatch for module.fpn.fpn_inner3.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]). size mismatch for module.fpn.fpn_inner4.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
Can anyone give me any clue why this happens?