Custom Dataset - AssertionError: Cloud ids must be unique across all the 'val' and 'test' stages, unless `test_mixed_in_val=True`

StevenZhangzhexu commented 9 months ago

Hi Damien,

I tried to train the model in a custom dataset but encounter below error. (I have followed the docs to set up env, dataset & configs)

Traceback (most recent call last):
  File "/home/steven/Desktop/git/superpoint_transformer/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/train.py", line 114, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
    self._run_sanity_check()
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check
    val_loop.run()
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 365, in _evaluation_step
    batch = call._call_strategy_hook(trainer, "batch_to_device", batch, dataloader_idx=dataloader_idx)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 269, in batch_to_device
    return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 334, in _apply_batch_transfer_handler
    batch = self._call_batch_hook("on_after_batch_transfer", batch, dataloader_idx)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 322, in _call_batch_hook
    return trainer_method(trainer, hook_name, *args)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/steven/Desktop/git/superpoint_transformer/src/datamodules/base.py", line 336, in on_after_batch_transfer
    return on_device_transform(nag)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 24, in __call__
    data = transform(data)
  File "/home/steven/Desktop/git/superpoint_transformer/src/transforms/transforms.py", line 23, in __call__
    return self._process(x)
  File "/home/steven/Desktop/git/superpoint_transformer/src/transforms/data.py", line 347, in _process
    raise ValueError(
ValueError: Input NAG does not have `is_val` attribute at level `0`

My data directory is like this:

/data/DATASET/
        └── raw/
            └── {train, test}/
                └── {file_name}.laz

and I want to train on train files and test on test files. In this case there is no validation dataset, is it condisered test_mixed_in_val ? Could you please help on this? Thanks!

drprojects commented 9 months ago

Hi @StevenZhangzhexu

Could you try setting:

trainval: False
val_on_test: True

in your dataset config file ?

PS: if you are interested in this project, don't forget to give the repo a ⭐, it matters to us !

StevenZhangzhexu commented 8 months ago

Hi @drprojects , I have set

trainval: False
val_on_test: True

in config but still got the same error

StevenZhangzhexu commented 8 months ago

I set strict= False to bypass the assertion error. But not sure if this will affect the traning result?

drprojects commented 8 months ago

Hey ! Sorry I re-read your previous messages and realize I did not reply regarding test_mixed_in_val. If I understand your dataset correctly, you should have: val_mixed_in_train=False and test_mixed_in_val=False. These settings are only useful in very specific conditions: when the preprocessed validation set is intetwined with the train or test sets. In these cases, the validation points are stored at the end of preprocessing with an additional is_val attributes. This allows identifying validation points among the train or test points, when loading data during training/inference.

All in all, this is quite specific and I assume you don't need this functionality. I am guessing the reason you are seeing this error is probably that your implementation has val_mixed_in_train=True or test_mixed_in_val=True somewhere. For instance, if you implemented your dataset by copying the S3DIS dataset implementation verbatim, you might have left the val_mixed_in_train=True here.

In any case, since you do not have a validation set per se, you still want to keep the above-suggested:

trainval: False
val_on_test: True

The program will use your test set as a validation set during training. Bear with me: this is not a recommended machine learning practice, but this lets you keep track of your model performance as training goes. If you plan on running several experiments, modifying some parameters, for selecting the best hyperparameters for you specific dataset, I would recommend you further split your current train set into train/validation.

drprojects commented 8 months ago

I set strict= False to bypass the assertion error. But not sure if this will affect the traning result?

I am not sure where you set that, but I would not recommend it. Try to fix the error by making sure you have not set val_mixed_in_train=True or test_mixed_in_val=True somewhere first.

StevenZhangzhexu commented 8 months ago

Hi @drprojects , thanks for your reply. The reason I set test_mixed_in_val=True is to bypass below error:

Traceback (most recent call last):
  File "src/train.py", line 139, in main
    metric_dict, _ = train(cfg)
  File "/home/steven/Desktop/git/superpoint_transformer/src/utils/utils.py", line 48, in wrap
    raise ex
  File "/home/steven/Desktop/git/superpoint_transformer/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/train.py", line 114, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 941, in _run
    self._data_connector.prepare_data()
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 94, in prepare_data
    call._call_lightning_datamodule_hook(trainer, "prepare_data")
  File "/home/steven/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/home/steven/Desktop/git/superpoint_transformer/src/datamodules/base.py", line 144, in prepare_data
    self.dataset_class(
  File "/home/steven/Desktop/git/superpoint_transformer/src/datasets/base.py", line 189, in __init__
    self.check_cloud_ids()
  File "/home/steven/Desktop/git/superpoint_transformer/src/datasets/base.py", line 337, in check_cloud_ids
    assert len(val.intersection(test)) == 0 or self.test_mixed_in_val, \
AssertionError: Cloud ids must be unique across all the 'val' and 'test' stages, unless `test_mixed_in_val=True

Because I have same dataset for Val and Test

drprojects commented 8 months ago

This is not what test_mixed_in_val=True is for. See the above explanation and the comments in the code.

val_on_test is what you want here.

Please try:

test_mixed_in_val: False
val_mixed_in_test: False
trainval: False
val_on_test: True

wy9933 commented 8 months ago

Hello, I have the same problem, my Val and Test are same dataset, and if I use

test_mixed_in_val: False
val_mixed_in_test: False
trainval: False
val_on_test: True

The code will raise

AssertionError: Cloud ids must be unique across all the 'val' and 'test' stages, unless `test_mixed_in_val=True

If I set ’test_mixed_in_val: True‘ , The code will raise

ValueError: Input NAG does not have is_val attribute at level 0

drprojects commented 8 months ago

Hi @StevenZhangzhexu and @wy9933. I confirm that you should not set test_mixed_in_val=True in your setting. The true error we need to investigate is:

AssertionError: Cloud ids must be unique across all the 'val' and 'test' stages, unless `test_mixed_in_val=True

Can you please show me how you implemented your dataset's all_base_cloud_ids() ?

def all_base_cloud_ids(self):
...

wy9933 commented 8 months ago

My dataset's all_base_cloud_ids() is

def all_base_cloud_ids(self):
    return {
        'train': [SCENES[i] for i in range(7) if i != self.test_idx],
        'val': [SCENES[self.test_idx]],
        'test': [SCENES[self.test_idx]]
    }

And SCENES is a file list of all point cloud scene name. So the Val and Test are same dataset. If I set test_mixed_in_val=False, the assert in check_cloud_ids in src/datasets/base.py will raise Error:

assert len(val.intersection(test)) == 0 or self.test_mixed_in_val, \
            "Cloud ids must be unique across all the 'val' and 'test' " \
            "stages, unless `test_mixed_in_val=True`"

drprojects commented 8 months ago

OK this is probably where the error comes from. You should set:

def all_base_cloud_ids(self):
    return {
        'train': [SCENES[i] for i in range(7) if i != self.test_idx],  # list of clouds in your train set
        'val': [],  # empty because you have no validation clouds
        'test': [SCENES[self.test_idx]]  # list of clouds in your test set
    }

You do not have validation clouds per se, so you should set an empty list there. Setting val_on_test=True will let the program know that it should use your test for the val stage too.

Ideally, if you intend to run multiple experiments and tune some hyperparameters to suit your dataset, you should split your train clouds in this function into train and val so that you have a proper validation set.

wy9933 commented 8 months ago

OK! It's working! Thanks for your answer!

drprojects commented 8 months ago

Cool ! Thanks for the feedback :wink:

drprojects / superpoint_transformer

Custom Dataset - AssertionError: Cloud ids must be unique across all the 'val' and 'test' stages, unless `test_mixed_in_val=True` #48