cleinc / bts

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation
GNU General Public License v3.0
635 stars 179 forks source link

RuntimeError: Error(s) in loading state_dict for DataParallel #50

Closed kHarshit closed 4 years ago

kHarshit commented 4 years ago

I trained bts (pytorch) on NYU for few epochs, but during testing, I'm getting the following error while loading the model:

Traceback (most recent call last):
  File "bts_test.py", line 221, in <module>
    test(args)
  File "bts_test.py", line 94, in test
    model.load_state_dict(checkpoint['model'])
  File "/nfs/interns/kharshit/miniconda3/envs/pylatest/lib/python3.7/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
    Missing key(s) in state_dict: "module.encoder.base_model.denseblock3.denselayer25.norm1.weight", "module.encoder.base_model.denseblock3.denselayer25.norm1.bias", "module.encoder.base_model.denseblock3.denselayer25.norm1.running_mean", "module.encoder.base_model.denseblock3.denselayer25.norm1.running_var", "module.encoder.base_model.denseblock3.denselayer25.conv1.weight", "module.encoder.base_model.denseblock3.denselayer25.norm2.weight", "module.encoder.base_model.denseblock3.denselayer25.norm2.bias",
...
size mismatch for module.encoder.base_model.conv0.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([96, 3, 7, 7]).
    size mismatch for module.encoder.base_model.norm0.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
    size mismatch for module.encoder.base_model.norm0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([96]).
...

The only thing I changed was to remove --multiprocessing_distributed from the arguments_train_nyu.txt file and added --gpu 0 and run script using CUDA_VISIBLE_DEVICES=0 to train using single GPU.

kHarshit commented 4 years ago

Solved it! It was a stupid mistake, I trained using densenet121, but testing arguments.txt contained densenet161.