drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
508 stars 65 forks source link

Example training crashes with seeming integer overflow #101

Closed gspr-sintef closed 2 months ago

gspr-sintef commented 2 months ago

Attempting to train the SPT semantic model on KITTI-360, as exemplified in the README, fails after processing about a third of the dataset. At least superficially, the crash smells a bit of an integer overflow. I'm using the code from 53f94f6ab77190963932f70068b1969dfb9451ff.

Traceback:

[2024-04-29 08:51:39,602][__main__][INFO] - Starting training!
Processing...
 36%|█████████████████████████████████▍                                                           | 86/239 [25:01<44:31, 17.46s/it]
[2024-04-29 09:16:41,589][src.utils.utils][ERROR] - 
Traceback (most recent call last):
  File "/home/somebody/third-party/superpoint-transformer/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
                               ^^^^^^^^^^^^^^^^^^
  File "/home/somebody/third-party/superpoint-transformer/src/train.py", line 114, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
  File "/home/somebody/.local/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/somebody/.local/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/somebody/.local/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 947, in _run
    self._data_connector.prepare_data()
  File "/home/somebody/.local/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 94, in prepare_data
    call._call_lightning_datamodule_hook(trainer, "prepare_data")
  File "/home/somebody/.local/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/third-party/superpoint-transformer/src/datamodules/base.py", line 144, in prepare_data
    self.dataset_class(
  File "/home/somebody/third-party/superpoint-transformer/src/datasets/base.py", line 223, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/home/somebody/.local/lib/python3.11/site-packages/torch_geometric/data/in_memory_dataset.py", line 57, in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log)
  File "/home/somebody/.local/lib/python3.11/site-packages/torch_geometric/data/dataset.py", line 97, in __init__
    self._process()
  File "/home/somebody/third-party/superpoint-transformer/src/datasets/base.py", line 647, in _process
    self.process()
  File "/home/somebody/third-party/superpoint-transformer/src/datasets/base.py", line 682, in process
    self._process_single_cloud(p)
  File "/home/somebody/third-party/superpoint-transformer/src/datasets/base.py", line 710, in _process_single_cloud
    nag = self.pre_transform(data)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/torch_geometric/transforms/compose.py", line 24, in __call__
    data = transform(data)
           ^^^^^^^^^^^^^^^
  File "/home/somebody/third-party/superpoint-transformer/src/transforms/transforms.py", line 23, in __call__
    return self._process(x)
           ^^^^^^^^^^^^^^^^
  File "/home/somebody/third-party/superpoint-transformer/src/transforms/partition.py", line 82, in _process
    data = data.to_trimmed()
           ^^^^^^^^^^^^^^^^^
  File "/home/somebody/third-party/superpoint-transformer/src/data/data.py", line 523, in to_trimmed
    edge_index, edge_attr = to_trimmed(
                            ^^^^^^^^^^^
  File "/home/somebody/third-party/superpoint-transformer/src/utils/graph.py", line 415, in to_trimmed
    edge_index, edge_attr = coalesce(
                            ^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/torch_geometric/utils/coalesce.py", line 102, in coalesce
    idx[1:], perm = index_sort(idx[1:], max_value=num_nodes * num_nodes)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/torch_geometric/utils/sort.py", line 27, in index_sort
    return pyg_lib.ops.index_sort(inputs, max_value=max_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/pyg_lib/ops/__init__.py", line 331, in index_sort
    return torch.ops.pyg.index_sort(inputs, max_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/somebody/.local/lib/python3.11/site-packages/torch/_ops.py", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pyg::index_sort() Expected a value of type 'Optional[int]' for argument 'max' but instead found type 'int'.
Position: 1
Value: 8449095720757080435264
Declaration: pyg::index_sort(Tensor indices, int? max=None) -> (Tensor, Tensor)
Cast error details: Unable to cast Python instance of type <class 'int'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)
drprojects commented 2 months ago

Hi @gspr-sintef, thanks for your interest in our project.

This is the first time I see this error. Have you made any change to the project ? Even in the configs ?

Just in case, can you please update to the latest version of the code and try again ?

PS: Also, if you ❤️ or use this project, don't forget to give it a ⭐, it means a lot to us !

gspr-sintef commented 2 months ago

I see now that I was accidentally working in my alternative environment (attempting to get SPT to work with newer Python versions than 3.8). You may wish to disregard this bug report as using an unsupported Python version. Although, the bug itself seems a bit worrisome.

Sorry for the noise.

gvoysey commented 2 months ago

i’m running 3.10.13 without issue. i think 3.11+ may be more of a stretch, though.

On Mon, Apr 29, 2024 at 07:27 gspr-sintef @.***> wrote:

I see now that I was accidentally working in my alternative environment (attempting to get SPT to work with newer Python versions than 3.8). You may wish to disregard this bug report as using an unsupported Python version. Although, the bug itself seems a bit worrisome.

Sorry for the noise.

— Reply to this email directly, view it on GitHub https://github.com/drprojects/superpoint_transformer/issues/101#issuecomment-2082472179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3ZD35Y5VHP46IHZNUQLTDY7YVC3AVCNFSM6AAAAABG56TLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGQ3TEMJXHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

drprojects commented 2 months ago

Thank you both for your feedback. Closing this now then