Closed sguinard closed 1 year ago
Hi Stéphane,
Thanks for your interest in the project and for catching this error !
I had already encountered this issue and thought I had completely fixed it but it came creeping back. If you are interested, the problem is simply that the model saves in the state_dict
some attributes of criterion
(ie the semantic losses in our case) which it cannot properly reload. Here, the problematic attribute is weight
which is used to weight down the importance of each class in the semantic segmentation losses.
Long story short, I just pushed a new commit which should fix this. Would you mind testing it on your end and letting me know if it solves your issue ?
Best,
Damien
Hi Damien,
Thanks for your quick reply! I'm testing this and will let you know if this works as soon as possible,
Best, Stephane
Hi @drprojects
Thank you for sharing your great work and codes.
I trained my server machine with nvidia a100 gpu (vram 80GB) and evaluated. I have no problem for the training session, but, I have an issue for the eval session.
Especially, once I run your codes regarding s3dis and dales dataset are work well. but, the kitti360 has the same issue @sguinard mentioned before.
When run eval script with the kitti360, I got the errors with 'state_dict' and 'size mismatch' as follows:
[2023-07-23 15:59:37,100][src.utils.utils][ERROR] -
Traceback (most recent call last):
File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/utils/utils.py", line 45, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "src/eval.py", line 105, in evaluate
trainer.test(model=model, datamodule=datamodule, ckpt_path=cfg.ckpt_path)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in test
return call._call_and_handle_interrupt(
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 778, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 939, in _run
self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 396, in _restore_modules_and_callbacks
self.restore_model()
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 278, in restore_model
trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 352, in load_model_state_dict
self.lightning_module.load_state_dict(checkpoint["state_dict"])
File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/models/segmentation.py", line 560, in load_state_dict
super().load_state_dict(state_dict, strict=strict)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
Missing key(s) in state_dict: "net.down_stages.0.transformer_blocks.0.ffn_norm.weight", "net.down_stages.0.transformer_blocks.0.ffn_norm.bias", "net.down_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.weight", "net.down_stages.0.transformer_blocks.1.ffn_norm.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.weight", "net.down_stages.0.transformer_blocks.2.ffn_norm.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.weight", "net.down_stages.1.transformer_blocks.0.ffn_norm.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.weight", "net.down_stages.1.transformer_blocks.1.ffn_norm.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.weight", "net.down_stages.1.transformer_blocks.2.ffn_norm.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.weight", "net.up_stages.0.transformer_blocks.0.ffn_norm.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.bias".
size mismatch for net.down_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 132]).
size mismatch for net.down_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 68]) from checkpoint, the shape in current model is torch.Size([128, 132]).
size mismatch for net.down_stages.1.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 260]).
size mismatch for net.up_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.up_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for head.0.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
size mismatch for head.0.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
size mismatch for head.1.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
size mismatch for head.1.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
[2023-07-23 15:59:37,102][src.utils.utils][INFO] - Closing loggers...
Error executing job with overrides: ['experiment=kitti360', 'ckpt_path=/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/logs/train/runs/2023-07-23_04-52-26/checkpoints/epoch_1419.ckpt']
Traceback (most recent call last):
File "src/eval.py", line 117, in main
evaluate(cfg)
File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/utils/utils.py", line 48, in wrap
raise ex
File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/utils/utils.py", line 45, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "src/eval.py", line 105, in evaluate
trainer.test(model=model, datamodule=datamodule, ckpt_path=cfg.ckpt_path)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in test
return call._call_and_handle_interrupt(
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 778, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 939, in _run
self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 396, in _restore_modules_and_callbacks
self.restore_model()
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 278, in restore_model
trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 352, in load_model_state_dict
self.lightning_module.load_state_dict(checkpoint["state_dict"])
File "/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/src/models/segmentation.py", line 560, in load_state_dict
super().load_state_dict(state_dict, strict=strict)
File "/home/hyunkoo/anaconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
Missing key(s) in state_dict: "net.down_stages.0.transformer_blocks.0.ffn_norm.weight", "net.down_stages.0.transformer_blocks.0.ffn_norm.bias", "net.down_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.weight", "net.down_stages.0.transformer_blocks.1.ffn_norm.bias", "net.down_stages.0.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.weight", "net.down_stages.0.transformer_blocks.2.ffn_norm.bias", "net.down_stages.0.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.0.transformer_blocks.2.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.weight", "net.down_stages.1.transformer_blocks.0.ffn_norm.bias", "net.down_stages.1.transformer_blocks.0.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.0.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.weight", "net.down_stages.1.transformer_blocks.1.ffn_norm.bias", "net.down_stages.1.transformer_blocks.1.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.1.ffn.mlp.2.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.weight", "net.down_stages.1.transformer_blocks.2.ffn_norm.bias", "net.down_stages.1.transformer_blocks.2.ffn_norm.mean_scale", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.0.bias", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.weight", "net.down_stages.1.transformer_blocks.2.ffn.mlp.2.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.weight", "net.up_stages.0.transformer_blocks.0.ffn_norm.bias", "net.up_stages.0.transformer_blocks.0.ffn_norm.mean_scale", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.0.bias", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.weight", "net.up_stages.0.transformer_blocks.0.ffn.mlp.2.bias".
size mismatch for net.down_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 132]).
size mismatch for net.down_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.0.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 68]) from checkpoint, the shape in current model is torch.Size([128, 132]).
size mismatch for net.down_stages.1.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.transformer_blocks.1.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.down_stages.1.transformer_blocks.2.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.0.weight: copying a param with shape torch.Size([64, 132]) from checkpoint, the shape in current model is torch.Size([128, 260]).
size mismatch for net.up_stages.0.in_mlp.mlp.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.1.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.3.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.up_stages.0.in_mlp.mlp.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.4.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.in_mlp.mlp.4.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa_norm.mean_scale: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([256, 128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([128, 32]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.v_rpe.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([128, 128]).
size mismatch for net.up_stages.0.transformer_blocks.0.sa.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for head.0.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
size mismatch for head.0.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
size mismatch for head.1.classifier.weight: copying a param with shape torch.Size([13, 64]) from checkpoint, the shape in current model is torch.Size([15, 128]).
size mismatch for head.1.classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Best regards, Hyunkoo
Ho @hyunkoome , thanks for your interest and feedback.
The error you encountered seems different, I think you may be using an S3DIS checkpoint for the KITTI-360 dataset. This would explain all the feature dimensions mismatch, as well as the final classifier size mismatch. It seems you are using a checkpoint from a training you launched on your machine:
/home/hyunkoo/DATA/ssd1/Codes/SemanticSeg3D/superpoint_transformer/logs/train/runs/2023-07-23_04-52-26/checkpoints/epoch_1419.ckpt
.
Are you certain you are using a KITTI-360 checkpoint ?
I am still having the issues @drprojects
@jaswanthbjk are you encountering the exact same error as @sguinard ? That is:
Traceback (most recent call last):
File "/app/superpoint_transformer/src/models/segmentation.py", line 545, in load_state_dict
super().load_state_dict(state_dict, strict=strict)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
Unexpected key(s) in state_dict: "criterion.criteria.0.weight", "criterion.criteria.1.weight".
Are you using the latest commit 6b9ac9aa0c96d843af7f50448a3fbf968263d56a ?
My issue is also the same
File "/home/jba/learnings/superpoint_transformer/src/models/segmentation.py", line 560, in load_state_dict
super().load_state_dict(state_dict, strict=strict)
File "/home/jba/miniconda3/envs/spt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PointSegmentationModule:
Missing key(s) in state_dict: "criterion.criteria.0.weight", "criterion.criteria.1.weight".
I cloned the repo yesterday.
I assumed it would be with the updated commit.
Hi @jaswanthbjk, that is strange, I can successfully run:
# Evaluate SPT on S3DIS Fold 5
python src/eval.py experiment=s3dis datamodule.fold=5 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SPT on KITTI-360 Val
python src/eval.py experiment=kitti360 ckpt_path=/path/to/your/checkpoint.ckpt
# Evaluate SPT on DALES
python src/eval.py experiment=dales ckpt_path=/path/to/your/checkpoint.ckpt
Are you using one of our .ckpt
provided here or some .ckpt
from your own pretraining ?
Just to be 100% safe, please make sure you git pull
the latest version.
I trained a model on s3dis by myself.
Yes I am running Eval script as you mentioned.
I was only successful when I changed strict=False. But that would give a wrong loss value during evaluation.
Yes setting strict=False
would bypass this issue, but I agree it is not a satisfying fix, especially if we are running load_state_dict
to resume training or fine tune.
Have you made any modification to the code other than that ?
Can you please share your pretrained S3DIS .ckpt
(and specify the related datamodule.fold
), so I can try loading them on my end ?
Hi @drprojects ,
Just wanted to confirm that your latest commit solved my ckpt reading issue,
Thanks a lot!
Thanks for the feedback @sguinard ! I will wait a bit for @jaswanthbjk .ckpt
to make sure things are in order before closing the issue.
Hey @drprojects,
It worked, Sorry I didn't load the checkpoint properly. You can close the issue now
Hi Damien,
Thanks for the great work and the code!
I'm currently performing some experiments with a docker-ized SPT based on nvidia/cuda:11.8.0-devel-ubuntu22.04 , on Kitty360.
I have no problem running the training script (both standard and 11g configs run smoothly), however, the evaluation script automatically fails when reading the saved checkpoints with the following error:
The same error happens with epoch-XXX.ckpt, latest.ckpt or the pretrained weights downloaded from your git,
A quick google search points to the loading modules that may try reading a distinct model from what has been stored, but since I didn't modify the config files except for adding the path to the data, this seems weid.
Any hints regarding this error?
Best regards, Stephane