Model shape does not match with checkpoint

Zero-Yi commented 4 months ago

Hi,

thank you for the great work. I am very interested in your model and would like to make some experiments on it.

For the first step I want to reproduce your result. However, after setting up the environment and downloading the provided checkpoint, I launched your script and immediately got this error:

Global seed set to 42
Traceback (most recent call last):
  File "scripts/evaluate_model.py", line 67, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "scripts/evaluate_model.py", line 42, in main
    model.load_state_dict(w["state_dict"])
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Mask4D:
    size mismatch for backbone.unet.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 64, 32]) from checkpoint, the shape in current model is torch.Size([64, 2, 2, 2, 32]).
    size mismatch for backbone.unet.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 128, 64]) from checkpoint, the shape in current model is torch.Size([128, 2, 2, 2, 64]).
    size mismatch for backbone.unet.u.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 256, 128]) from checkpoint, the shape in current model is torch.Size([256, 2, 2, 2, 128]).
    size mismatch for backbone.unet.u.u.u.conv.2.weight: copying a param with shape torch.Size([2, 2, 2, 256, 256]) from checkpoint, the shape in current model is torch.Size([256, 2, 2, 2, 256]).
    size mismatch for backbone.unet.u.u.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 256, 256]) from checkpoint, the shape in current model is torch.Size([256, 2, 2, 2, 256]).
    size mismatch for backbone.unet.u.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 128, 256]) from checkpoint, the shape in current model is torch.Size([128, 2, 2, 2, 256]).
    size mismatch for backbone.unet.u.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 64, 128]) from checkpoint, the shape in current model is torch.Size([64, 2, 2, 2, 128]).
    size mismatch for backbone.unet.deconv.2.weight: copying a param with shape torch.Size([2, 2, 2, 32, 64]) from checkpoint, the shape in current model is torch.Size([32, 2, 2, 2, 64]).

Note that I did not change anything on configuration other than the dataset path. It seems like a mis-match between the defined model and provided checkpoint "mask4d.ckpt".

Can you look into the issue and unify them?

G12311231 commented 3 months ago

我也遇到了这个问题，这个是torch版本的问题，你需要安装和作者readme里一样的torch版本即torch==1.12.0

rmarcuzzi commented 2 months ago

Hi! Sorry for the delay, installing the same version of torch should avoid any problem when loading the checkpoint weights.

PRBonn / Mask4D

Model shape does not match with checkpoint #5