Implementing SensatUrban

JeppeVHolm commented 11 months ago

Hi Damian,

Thanks for SPT! We are three university students trying to use SPT to classify point clouds.

We have been looking into incorporating the SensatUrban dataset into SPT. Skærmbillede 2023-10-19 140859 It seems as if the data read correctly, as shown in the picture above.

We tried looking into issue #15 to resolve it, but it did not work. Skærmbillede 2023-10-19 141249

Would you happen to have any suggestions on how to resolve this issue? We have been through the guide for setting up our own dataset several times and have found a lot of inspiration in how dales are setup.

Looking forward to hearing from you. Best regards Jeppe

drprojects commented 11 months ago

Hi, I am super happy someone is trying to integrate SensatUrban in the framework, I was hoping to do it at some point. I would gladly welcome a PR in the end once you have things working !

Normally, if you followed the existing code's style, you should only have to create the following:

a SensatUrbanDataModule inheriting from BaseDataModule in `src/datamodules/sensaturban.py
a SensatUrbanDataset inheriting from BaseDataset in `src/datasets/sensaturban.py, along with its reader function to parse raw data files
optional helper variables in src/datasets/sensaturban_config.py
a datamodule config file config/datamodules/sensaturban.yaml

Is that what you have ? Have you made any other modifications to the project ?

In particular, can you please share the code for:

your datamodule config file ?
your raw data reading function ?

Since you checked #15 I am assuming you checked your semantic labels are dense in [0, N[ and only use the N label for unclassified/unlabeled/ignored points ?

cell1604 commented 10 months ago

Hi, another student from the same project group here.

Thank you so much for your response!

We didn't take our starting point in your base files as we are still newbies with regard to coding. Instead, we copied each of the files you mentioned from Dales and adjusted these to Sensat.

Besides the files you are mentioning, we also added a line to src/datasets/_init_.py as you can see:

We have pushed our work at its current state to the following repository, where you can check our files: https://github.com/cell1604/superpoint_transformer_P9/tree/spt_sensaturban

With regard to #15 : We went through all data files from SensatUrban to make sure they are annotated correctly. In this process we found some error in 3 files resulting that we couldn't open them, so we tried to remove these from the training data (didn't work). Furthermore we discovered that the data in "test" were not annotated and hence has no semantic labels (we assume that the test data should also have such?), so we also tried replacing the test data with data that were annotated - this didn't resolve the issue either.

As you may know, Sensat has 13 classes ranging from [0, 12], but does not have a class specifically for "Unknown" points - however, we have experimented forth and back trying to adding and removing a such class trying to figure out if it is a criteria in SPT? However we have not had any luck regardless of what we've tried. Maybe you can clarify how to configure this correctly?

In #15, you suggested adding this line in data/data.py in order to check for empty tensors or something like that I figure?:

assert a.sum(dim=1).gt(0).all(), f"Some points in the label histogram 'self.y' do not have any labels." However, we are not quite sure how to use this, as it seems to require "a" to be a defined tensor of some sort?

Looking forward hearing back from you! Regards, Cellina

drprojects commented 10 months ago

we are still newbies with regard to coding.

I must warn you that this project is not the easiest to get started with. Making modifications to the project as a whole will require that you are proficient in machine learning in general and 3D deep learning in particular, and at ease with the following: python, torch, torch-lightning, torch-geometric, hydra. I will try to give you pointers to help you setup SensatUrban, but I won't be able to provide detailed support for things that I did not code and release myself.

we copied each of the files you mentioned from Dales and adjusted these to Sensat.

That is fine, it is a good starting point for SensatUrban.

we found some error in 3 files resulting that we couldn't open them, so we tried to remove these from the training data (didn't work).

This is quite strange for an officially-released dataset like SensatUrban. Have you checked that the files are not corrupt, weird characters, etc ? Maybe the official github repo or repos of other people using that dataset might tell you if someone else encountered this issue.

Furthermore we discovered that the data in "test" were not annotated and hence has no semantic labels (we assume that the test data should also have such?), so we also tried replacing the test data with data that were annotated - this didn't resolve the issue either.

This is normal, SensatUrban uses the test files to evaluate methods through an official benchmarking server. The test labels are held-out, you must define a validation set for your own experiments and only use the test set for submitting your predictions to the SensatUrban server. See the KITTI-360 dataset to get an idea of how to deal with train/val/test with held-out test labels.

As you may know, Sensat has 13 classes ranging from [0, 12], but does not have a class specifically for "Unknown" points - however, we have experimented forth and back trying to adding and removing a such class trying to figure out if it is a criteria in SPT? However we have not had any luck regardless of what we've tried. Maybe you can clarify how to configure this correctly?

Your settings seem fine in the screenshot. The "unknown" class is not necessary unless some points in SensatUrban have labels outside of [0, 12] (I haven't checked myself). Some datasets use this type of extra class to indicate that the loss and metrics should not be computed on these "unlabeled"/"ignored"/unknown" points. Just for safety, you can keep SENSAT_NUM_CLASSES=13 and append 'ignored' to the CLASS_NAMES and [0, 0, 0] to CLASS_COLORS.

In https://github.com/drprojects/superpoint_transformer/issues/15, you suggested adding this line in data/data.py in order to check for empty tensors or something like that I figure?: assert a.sum(dim=1).gt(0).all(), f"Some points in the label histogram 'self.y' do not have any labels." However, we are not quite sure how to use this, as it seems to require "a" to be a defined tensor of some sort?

Yes, you just need to adapt the line to the context:

            elif k == 'y' and val.dim() > 1 and y_to_csr:
                assert val.sum(dim=1).gt(0).all(), f"Some points in the label histogram 'self.y' do not have any labels."
                 sg = f.create_group(osp.join(f.name, '_csr_', k))
                 save_dense_to_csr(val, sg, fp_dtype=fp_dtype)

At this point in the code, we are saving preprocessed Data objects to disk. The 'y' attribute of the Data objects is used to handle the semantic labels. These labels can either be stored as a simple 1D Tensor or as 2D Tensor, in which case they are representing the histogram of labels for a voxel or a superpoint (we keep track of all the labels of the raw points inside the said voxel of superpoint). In the snippet above, I suggest you temporarily add a line to check if there are any issues with the computed label histograms. Specifically, it will throw an error if one of the voxels/superpoints has an empty histogram. This should normally never happen. If it does, it means there is probably an upstream error related to the labels. For instance, some points in your raw data have labels which are not in [0, 12]. Given what you mentioned above, depending how you read the test files which have no semantic labels, you might have caused some downstream errors too. In particular, when reading tiles for the test set, make sure the output of your raw data reader function read_dales_tile() (or read_sensaturban_tile(), however you called it) is a Data object which has no 'y' attribute. Again, you can get inspiration from the KITTI-360 dataset for how to handle test data.

Good luck and happy coding

PS: if you are using and like the project, don't forget to all give us a :star:, it matters to us !

drprojects commented 10 months ago

Without further reply from you, I consider this issue solved and am closing it.

PS: if you are using and like the project, don't forget to all give us a ⭐, it matters to us !

biophase commented 3 months ago

Not sure if this helps anyone but I noticed the mentioned error also occurs if the conditions mentioned by @drprojects are met (e.g. labels in range [0,C]; all labels present at least once) but for some reason 1D labels are passed as a 2D array to the Data object with shape [N, 1], which is what you get if you e.g. read a ply file as pandas dataframe and pass a 'labels' column. I was able to fix this with .reshape(-1) on the labels.

drprojects / superpoint_transformer

Implementing SensatUrban #35