size mismatch when loading a saved checkpoint model

Karbo123 commented 2 years ago

I recently find the proposed PMP-Net is an interesting and pioneering work to perform progressive point cloud generation for the cloud completion task. I'd like to conduct the comparison on another dataset, so I've rewritten the dataset, I don't think the code will change much.

However, I encounter a problem when I want to load a saved checkpoint model to perform inference. My training script is based on PMP-Net/core/train_pcn.py and the inference script is based on PMP-Net/core/inference_pcn.py.

To reproduce the error, I tried to save the model immediately before starting the training for loop. Then I ran the script with python main_pcn.py --inference, it could load the checkpoint file but was not able to load the state_dict to the model. The error messages are as follows.

RuntimeError: Error(s) in loading state_dict for DataParallel:
        size mismatch for module.step_1.sa_module_1.mlp_conv.0.conv.weight: copying a param with shape torch.Size([64, 6, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 9, 1, 1]).
        size mismatch for module.step_1.fp_module_1.mlp_conv.0.conv.weight: copying a param with shape torch.Size([128, 134, 1]) from checkpoint, the shape in current model is torch.Size([128, 137, 1]).

Any suggestions for this? Many thanks in advance!

Karbo123 commented 2 years ago

I found in the PMP-Net/core/train_pcn.py, it loaded from models.model import Model as the model, but in PMP-Net/core/inference_pcn.py it loaded from models.model import ModelNoise as Model as the model. Why do training and testing load different models? Shouldn't they be the same model?

Karbo123 commented 2 years ago

After I change the imported model to from models.model import Model, it works fine.

xiaolongTang163 commented 1 year ago

After I change the imported model to from models.model import Model, it works fine.

Thanks for your generous help for sharing with me the resolution which fixes the problem confusing me for a long time.

diviswen / PMP-Net

size mismatch when loading a saved checkpoint model #6