JonasSchult / Mask3D

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
MIT License
525 stars 103 forks source link

RuntimeError: selected index k out of range #123

Open bh-cai opened 1 year ago

bh-cai commented 1 year ago

First of all, thank you very much for releasing the model, and for your help all the time. I have run with the parament as the flows:

general.experiment_name="train1_scannet200_val"
general.project_name="scannet200"
data/datasets=scannet200
general.num_targets=201
data.num_labels=200
general.eval_on_segments=true
general.train_on_segments=true

and get the new checkpoint, but when I use it to run the test, I got the error as the flow:

/home/mylabs/Mask3D/datasets/semseg.py:730: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  file = yaml.load(f)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [6]
Testing DataLoader 0:   0%|                             | 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/mylabs/Mask3D/main_instance_segmentation.py", line 114, in <module>
    main()
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/core/utils.py", line 128, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/mylabs/Mask3D/main_instance_segmentation.py", line 110, in main
    test(cfg)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/hydra/main.py", line 27, in decorated_main
    return task_function(cfg_passthrough)
  File "/home/mylabs/Mask3D/main_instance_segmentation.py", line 100, in test
    runner.test(model)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in test
    return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in _test_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1168, in _run
    results = self._run_stage()
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1251, in _run_stage
    return self._run_evaluate()
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1297, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 143, in advance
    output = self._evaluation_step(**kwargs)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 240, in _evaluation_step
    output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1706, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/root/anaconda3/envs/mask3d_cuda113/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 379, in test_step
    return self.model.test_step(*args, **kwargs)
  File "/home/mylabs/Mask3D/trainer/trainer.py", line 560, in test_step
    return self.eval_step(batch, batch_idx)
  File "/home/mylabs/Mask3D/trainer/trainer.py", line 536, in eval_step
    self.eval_instance_step(
  File "/home/mylabs/Mask3D/trainer/trainer.py", line 713, in eval_instance_step
    scores, masks, classes, heatmap = self.get_mask_and_scores(
  File "/home/mylabs/Mask3D/trainer/trainer.py", line 591, in get_mask_and_scores
    scores_per_query, topk_indices = mask_cls.flatten(0, 1).topk(
RuntimeError: selected index k out of range

I have checked the checkpoints, and found some differences: What you provided is "scannet200_val.ckpt == 151.51MB", and I have got by trained are bigger, it is 908.21MB,and have three, you can see the flows:

epoch=34-val_mean_ap_50=0.001.ckpt == 908.21 epoch=99-val_mean_ap_50=0.000.ckpt == 908.21 last-epoch.ckpt == 908.21 last-v1.ckpt == 908.21 last.ckpt == 908.21

I use the last.ckpt to test, but wrong. May I get the solution from you? It would be my pleasure! Looking forward to your reply!

SnowCastle123 commented 10 months ago

I've also run into such an issue when use_dbscan: false is set in config. What I don't understand is why not using dbscan for validation produces this error