Issue with loading checkpoints

sguinard commented 1 year ago

Hi Damien,

Thanks for the great work and the code!

I'm currently performing some experiments with a docker-ized SPT based on nvidia/cuda:11.8.0-devel-ubuntu22.04 , on Kitty360.

I have no problem running the training script (both standard and 11g configs run smoothly), however, the evaluation script automatically fails when reading the saved checkpoints with the following error:

Traceback (most recent call last):
  File "/app/superpoint_transformer/src/models/segmentation.py", line 545, in load_state_dict
    super().load_state_dict(state_dict, strict=strict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
    Unexpected key(s) in state_dict: "criterion.criteria.0.weight", "criterion.criteria.1.weight".

The same error happens with epoch-XXX.ckpt, latest.ckpt or the pretrained weights downloaded from your git,

A quick google search points to the loading modules that may try reading a distinct model from what has been stored, but since I didn't modify the config files except for adding the path to the data, this seems weid.

Any hints regarding this error?

Best regards, Stephane

drprojects commented 1 year ago

Hi Stéphane,

Thanks for your interest in the project and for catching this error !

I had already encountered this issue and thought I had completely fixed it but it came creeping back. If you are interested, the problem is simply that the model saves in the state_dict some attributes of criterion (ie the semantic losses in our case) which it cannot properly reload. Here, the problematic attribute is weight which is used to weight down the importance of each class in the semantic segmentation losses.

Long story short, I just pushed a new commit which should fix this. Would you mind testing it on your end and letting me know if it solves your issue ?

Best,

Damien

sguinard commented 1 year ago

Hi Damien,

Thanks for your quick reply! I'm testing this and will let you know if this works as soon as possible,

Best, Stephane

hyunkoome commented 1 year ago

Hi @drprojects

Thank you for sharing your great work and codes.

I trained my server machine with nvidia a100 gpu (vram 80GB) and evaluated. I have no problem for the training session, but, I have an issue for the eval session.

Especially, once I run your codes regarding s3dis and dales dataset are work well. but, the kitti360 has the same issue @sguinard mentioned before.

When run eval script with the kitti360, I got the errors with 'state_dict' and 'size mismatch' as follows:

[2023-07-23 15:59:37,100][src.utils.utils][ERROR] - 
Traceback (most recent call last):
  File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/eval.py", line 105, in evaluate
    trainer.test(model=model, datamodule=datamodule, ckpt_path=cfg.ckpt_path)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in test
    return call._call_and_handle_interrupt(
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 778, in _test_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 939, in _run
    self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 396, in _restore_modules_and_callbacks
    self.restore_model()
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 278, in restore_model
    trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 352, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"])
  File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/models/segmentation.py", line 560, in load_state_dict
    super().load_state_dict(state_dict, strict=strict)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
    Missing key(s) in state_dict: "net.down_stages.0.transformer_blocks.0.ffn_norm.weight", "net.down_stages.0.transformer_blocks.0.ffn_norm.bias", "net.down_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.weight", "net.down_stages.0.transformer_blocks.1.ffn_norm.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.weight", "net.down_stages.0.transformer_blocks.2.ffn_norm.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.weight", "net.down_stages.1.transformer_blocks.0.ffn_norm.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.weight", "net.down_stages.1.transformer_blocks.1.ffn_norm.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.weight", "net.down_stages.1.transformer_blocks.2.ffn_norm.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.weight", "net.up_stages.0.transformer_blocks.0.ffn_norm.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.bias". 
    size mismatch for net.down_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 132]).
    size mismatch for net.down_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 68]) from checkpoint, the shape in current model is torch.Size([128, 132]).
    size mismatch for net.down_stages.1.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 260]).
    size mismatch for net.up_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for head.0.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
    size mismatch for head.0.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
    size mismatch for head.1.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
    size mismatch for head.1.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
[2023-07-23 15:59:37,102][src.utils.utils][INFO] - Closing loggers...
Error executing job with overrides: ['experiment=kitti360', 'ckpt_path=/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/logs/train/runs/2023-07-23_04-52-26/checkpoints/epoch_1419.ckpt']
Traceback (most recent call last):
  File "src/eval.py", line 117, in main
    evaluate(cfg)
  File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/utils/utils.py", line 48, in wrap
    raise ex
  File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/eval.py", line 105, in evaluate
    trainer.test(model=model, datamodule=datamodule, ckpt_path=cfg.ckpt_path)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in test
    return call._call_and_handle_interrupt(
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 778, in _test_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 939, in _run
    self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 396, in _restore_modules_and_callbacks
    self.restore_model()
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 278, in restore_model
    trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 352, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"])
  File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/models/segmentation.py", line 560, in load_state_dict
    super().load_state_dict(state_dict, strict=strict)
  File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
    Missing key(s) in state_dict: "net.down_stages.0.transformer_blocks.0.ffn_norm.weight", "net.down_stages.0.transformer_blocks.0.ffn_norm.bias", "net.down_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.weight", "net.down_stages.0.transformer_blocks.1.ffn_norm.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.weight", "net.down_stages.0.transformer_blocks.2.ffn_norm.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.weight", "net.down_stages.1.transformer_blocks.0.ffn_norm.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.weight", "net.down_stages.1.transformer_blocks.1.ffn_norm.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.weight", "net.down_stages.1.transformer_blocks.2.ffn_norm.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.weight", "net.up_stages.0.transformer_blocks.0.ffn_norm.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.bias". 
    size mismatch for net.down_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 132]).
    size mismatch for net.down_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 68]) from checkpoint, the shape in current model is torch.Size([128, 132]).
    size mismatch for net.down_stages.1.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 260]).
    size mismatch for net.up_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
    size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
    size mismatch for head.0.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
    size mismatch for head.0.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
    size mismatch for head.1.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
    size mismatch for head.1.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Best regards, Hyunkoo

drprojects commented 1 year ago

Ho @hyunkoome , thanks for your interest and feedback.

The error you encountered seems different, I think you may be using an S3DIS checkpoint for the KITTI-360 dataset. This would explain all the feature dimensions mismatch, as well as the final classifier size mismatch. It seems you are using a checkpoint from a training you launched on your machine: /home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/logs/train/runs/2023-07-23_04-52-26/checkpoints/epoch_1419.ckpt. Are you certain you are using a KITTI-360 checkpoint ?

jaswanthbjk commented 1 year ago

I am still having the issues @drprojects

drprojects commented 1 year ago

@jaswanthbjk are you encountering the exact same error as @sguinard ? That is:

Traceback (most recent call last):
  File "/app/superpoint_transformer/src/models/segmentation.py", line 545, in load_state_dict
    super().load_state_dict(state_dict, strict=strict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
    Unexpected key(s) in state_dict: "criterion.criteria.0.weight", "criterion.criteria.1.weight".

Are you using the latest commit 6b9ac9aa0c96d843af7f50448a3fbf968263d56a ?

jaswanthbjk commented 1 year ago

My issue is also the same

File "/home/jba/learnings/superpoint_transformer/src/models/segmentation.py", line 560, in load_state_dict
    super().load_state_dict(state_dict, strict=strict)
  File "/home/jba/miniconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
        Missing key(s) in state_dict: "criterion.criteria.0.weight", "criterion.criteria.1.weight".

I cloned the repo yesterday.

I assumed it would be with the updated commit.

drprojects commented 1 year ago

Hi @jaswanthbjk, that is strange, I can successfully run:

# Evaluate SPT on S3DIS Fold 5
python src/eval.py experiment=s3dis datamodule.fold=5 ckpt_path=/path/to/your/checkpoint.ckpt

# Evaluate SPT on KITTI-360 Val
python src/eval.py experiment=kitti360  ckpt_path=/path/to/your/checkpoint.ckpt 

# Evaluate SPT on DALES
python src/eval.py experiment=dales ckpt_path=/path/to/your/checkpoint.ckpt

Are you using one of our .ckpt provided here or some .ckpt from your own pretraining ?

Just to be 100% safe, please make sure you git pull the latest version.

jaswanthbjk commented 1 year ago

I trained a model on s3dis by myself.

Yes I am running Eval script as you mentioned.

I was only successful when I changed strict=False. But that would give a wrong loss value during evaluation.

drprojects commented 1 year ago

Yes setting strict=False would bypass this issue, but I agree it is not a satisfying fix, especially if we are running load_state_dict to resume training or fine tune.

Have you made any modification to the code other than that ?

Can you please share your pretrained S3DIS .ckpt (and specify the related datamodule.fold), so I can try loading them on my end ?

sguinard commented 1 year ago

Hi @drprojects ,

Just wanted to confirm that your latest commit solved my ckpt reading issue,

Thanks a lot!

drprojects commented 1 year ago

Thanks for the feedback @sguinard ! I will wait a bit for @jaswanthbjk .ckpt to make sure things are in order before closing the issue.

jaswanthbjk commented 1 year ago

Hey @drprojects,

It worked, Sorry I didn't load the checkpoint properly. You can close the issue now

drprojects / superpoint_transformer

Issue with loading checkpoints #12