Using KPConv with partially subsampled S3DIS dataset for detecting indoor objects

HuguesTHOMAS / KPConv-PyTorch

Kernel Point Convolution implemented in PyTorch

MIT License

784 stars 155 forks source link

Using KPConv with partially subsampled S3DIS dataset for detecting indoor objects #147

Closed G-Anjanappa closed 1 year ago

G-Anjanappa commented 2 years ago

Hello Thomas,

Thank you for the open-sourced code. I am using your network to model safety-related assets like lights, exit signs, and fire alarms in an indoor scene. For this application, I intend to use the S3DIS data with conditional subsampling before writing them into .ply files in the prepare_S3DIS_ply() function. For example, I am subsampling ceiling, floor, and a few other classes using grid_subsampling() with sampleDl = 0.02 and keeping the original point density for some classes to provide the network more information about these classes (as exit signs and fire alarms are very few in number). This will result in varying point densities throughout the area.

My questions now are:

I do not want to subsample again while loading the .ply files before starting the training, as I have partly subsampled the data before. Is that okay?
If I implement case-1, first_subsampling_dl will be 0 (technically). But I notice that the method calibration() in the S3DIS.py file uses the first_subsampling_dl variable to assess neighbor limits. My question here is, can I use an arbitrary value, say first_subsampling_dl = 0.01 here? I don't clearly understand the effect of this on the network's performance.

I have implemented these modifications already and tested them with the below parameters.

**### Input parameters*** num_classes = 13 in_radius = 1.200000 num_kernel_points = 15 deform_radius = 4.000000 KP_extent = 1.200000 segloss_balance = class

I have got pretty decent results for all 13 classes. But I wanted to be sure if I was proceeding correctly. Could you please help me with this?

Thank you in advance.

Regards, Geethanjali

HuguesTHOMAS commented 2 years ago

Hi @G-Anjanappa,

The first thing I have to say is: there is something wrong with what you are doing from a scientific viewpoint.

For example, I am subsampling ceiling, floor, and a few other classes using grid_subsampling() with sampleDl = 0.02 and keeping the original point density for some classes to provide the network more information about these classes

This is wrong, you cannot use groundtruth information to alter the data, because when you are going to test the network on real test data (without groundtruth), you will not be able to do this partial subsampling. Your goal is to predict classes but you use these classes to prepare your data, so that's impossible.

Now to answer your question here is what you can do:

Never use groundtruth information in your data processing
Chose between the two following strategies:

a) You chose a small value for first_subsampling_dl like 0.01. To keep details in your small objects. As a consequence, you need to reduce in_radius accordingly, otherwise your network will demand a huge amount of memory and time. This is not a problem in you case because you are focusing on small object, the network does not need large input spheres because it does not care about large objects like chairs or tables. Perform some tests to find the best values of first_subsampling_dl and in_radius in your own case. My advice is usually to keep a ratio bellow 50 between the two values, to have a reasonable network size.

b) You do not subsample the input point clouds. You can modify the code here to do that: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/73e444d486cd6cb56122c3dd410e51c734064cfe/datasets/S3DIS.py#L738-L749 In that case, you still need to choose the first_subsampling_dl value as it controls the convolution size, the subsampling size of the next layers, etc. The same goes for in_radius and the ratio between the two like I said in the previous point.

I advise that you chose option a), because if the input data is not subsampled, you have no control over the number of neighbor points in the convolutions and can end up with OOM errors. Furthermore, it is very unlikely that adding more points in convolutions will help. The ratio of 2.5 that we use is already big enough to get info with 15 kernel points.

G-Anjanappa commented 2 years ago

Hello Thomas,

Thank you for your quick response and suggestions. I am testing the network with your suggested parameters.

Meanwhile, there are a few things which are not very clear for me,

I performed the sub-sampling based on labels for the S3DIS dataset as a step for data preparation before the training or testing. This was mainly to balance out the amount of data for each class to allow the network to learn; otherwise, certain classes like walls/floor have more support than exit signs/fire alarms. Even with a minimal sampling value like 0.01, the data for small objects aren't very representative. As they are further downsampled, the information is lost, and other classes like ceiling, floor, furniture, and walls still have higher point density. Another possible idea was to add more of the objects of my interest into the scene, this would again be manipulating data for training but to make it more informative??
There is a flag in the code, use_potentials, for class unbalance problems in the dataset. Does this flag relevant in my case? (Please correct me if my understanding is wrong)
Of course, this kind of sub-sampling would not be possible for a new dataset. But that is the aim of the experiment. I want to see how the model trained on the dataset from (1) predicts on a new dataset without subsampling.

Thank you in advance for your time. All your suggestions are very much appreciated.

Regards, Geethanjali

HuguesTHOMAS commented 2 years ago

@G-Anjanappa ,

So to answer your questions.

I performed the sub-sampling based on labels for the S3DIS dataset as a step for data preparation before the training or testing.

You can do whatever you want with training data, as long as you don't use the labels for test data (because you are not supposed to know them) In fact in most online benchmarks dataset (like scannet for example). You don't have access to the test labels. So you can do this partial subsampling on the training data if you want, but you must not do it on the test data. And therefore, it does not make any sense to do it on the training data anymore, as the network would be trained on data that is different from your test data (Because I can tell you the network will be able to detect the change in density and use that a feature to detect your objects, and then will not work well on your test data)

Your second strategy is called data augmentation and is totally acceptable because you only do it on the training data to help the network see more small objects. You do not alter the test data. It should be in fact a great strategy in my opinion for your task.

There is a flag in the code, use_potentials, for class unbalance problems in the dataset. Does this flag relevant in my case?

It is indeed, when using use_potentials=True, the training will take input spheres regularly across the dataset. And when using use_potentials=False, for each class, the training will take the same amount (let it be N) spheres centered on an object of this class.

In your case, use_potentials=False should be helpful as it helps detect the classes that are in minority in the data.

Of course, this kind of sub-sampling would not be possible for a new dataset. But that is the aim of the experiment. I want to see how the model trained on the dataset from (1) predicts on a new dataset without subsampling.

Well, if you want to train on a dataset partially subsampled and test on a dataset that is not, my opinion is that you will not get the best results as I told you earlier, because the network will surely use the difference in density to detect your objects. But you are totally free to test it, as long as you do not partially subsample the test set, it is a scientifically valid approach.

G-Anjanappa commented 2 years ago

Thank you for your answers, @HuguesTHOMAS .

Could you please elaborate on the network detecting the change in density and using that as a feature to detect the objects? It would be helpful and interesting to understand this.

Also, when I try to use use_potentials=False I get a runtime error as mentioned in #98. I did as you have asked in that issue to not use use_potentials=False for the validation set. However, I still have the issue. Could you please help me out?

HuguesTHOMAS commented 2 years ago

Could you please elaborate on the network detecting the change in density and using that as a feature to detect the objects? It would be helpful and interesting to understand this.

There is not much to say, the network creates features from the data and it will probably get features measuring the density (a convolution with all weights equal to one does that for example).

Yes, you should always set use_potentials=True for the validation (same reason as before you are not supposed to have the classes on test data)

What is your exact error message?

G-Anjanappa commented 2 years ago

I am using the following code:

    # Initialize datasets
    training_dataset = S3DISDataset(config, set='training', use_potentials=False)
    test_dataset = S3DISDataset(config, set='validation', use_potentials=True)

I am getting the below error:

Traceback (most recent call last):
  File "/home/s2578956/Experiments/KPConv-PyTorch/train_S3DIS.py", line 293, in <module>
    training_sampler.calibration(training_loader, verbose=True)
  File "/home/s2578956/Experiments/KPConv-PyTorch/datasets/S3DIS.py", line 1408, in calibration
    for batch_i, batch in enumerate(dataloader):
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 944, in __init__
    self._reset(loader, first_iter=True)
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 975, in _reset
    self._try_put_index()
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1209, in _try_put_index
    index = self._next_index()
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 229, in __iter__
    for idx in self.sampler:
  File "/home/s2578956/Experiments/KPConv-PyTorch/datasets/S3DIS.py", line 1157, in __iter__
    self.dataset.epoch_inds += torch.from_numpy(all_epoch_inds[:, :num_centers])
RuntimeError: The size of tensor a (3000) must match the size of tensor b (2961) at non-singleton dimension 1

HuguesTHOMAS commented 2 years ago

I think the error comes from the fact that you did not have enough training points for one of your classes. It is a case I never encountered before, but I made a correction for it and it should work now with the new version of the code.

can you test it and see if it works? I did not test it on my computer yet.

G-Anjanappa commented 2 years ago

Hi @HuguesTHOMAS

With the changes you provided, I am not encountering the error anymore. Thank you very much for the quick response.

The problem cited in #2 looks pretty similar to what I intend to do. The proposition made by the user to combine both potential and class balancing for training data selection seems interesting. Do you think that distributing potentials throughout the point cloud based on classes would improve the possibility of predicting the minority classes? Could you please share your thoughts on this?

Also, you mentioned in the same discussion that the parameter class_w controls the weight of each class in the loss. How do we decide the values for this for each class?

Thank you in advance.

HuguesTHOMAS commented 2 years ago

Well at this point this is a valid research problem, I'll let you explore it and find the answers yourself.

class_w is used here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3d683b6bd6bf058135d3f9f155cd41595dc81c16/models/architectures.py#L306-L311

and you can set your own custom in the Config class of the training script as I do for SemanticKitti https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/train_SemanticKitti.py#L187-L188

It is just a list of weights (one for each class). I did not experiment a lot with it, but I can tell you this will not magically solve your issues. Other strategies like data augmentation may be more promising.

Good luck with your research.

G-Anjanappa commented 2 years ago

Hi @HuguesTHOMAS,

The predictions on minority classes improved with the suggestions in the previous comments, though I have to test with data augmentation yet. Thank you for the support.

I have another question. Suppose the training data had 12 semantic classes but the test data only 11 classes. Could you please help me understand how the network handles such cases? How IoUs or other metrics are calculated in such a case?

HuguesTHOMAS commented 2 years ago

Suppose the training data had 12 semantic classes but the test data only 11 classes. Could you please help me understand how the network handles such cases? How IoUs or other metrics are calculated in such a case?

It is impossible to have a different number of classes during training and test. If the network is trained to predict 12 classes, it will predict 12 classes. There is one thing that can happen: some points could not have a class or could belong to classes that should be ignored by the network. Let's say we have 12 classes including 1 that is not relevant, in that case, we train the network only on 11 classes and therefore test it on 11 classes.

It is the case for example in the semanticKitti dataset. In this case, we add this irrelevant class to the list of ignored classes:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/5b5641e02daac0043adfe97724de8c771dd4772f/datasets/SemanticKitti.py#L121-L122

Then what will happen is that when we apply the loss of the network to the semantic prediction for each point, we ignore the points of this class, and only apply the loss to the relevant points. Therefore, for the network, only the relevant classes exist.

G-Anjanappa commented 2 years ago

Hi @HuguesTHOMAS ,

Thank you for the response.

I understand the second case. But what happens suppose the test set doesn't have points belonging to a class that was trained upon? We wouldn't know this prior to define the labels to ignore.

HuguesTHOMAS commented 2 years ago

But what happens suppose the test set doesn't have points belonging to a class that was trained upon? We wouldn't know this prior to define the labels to ignore.

If you don't know that the class will not be present in the test set, you should just train your network on all classes. If the network performs well, you will not have meant prediction of the class that is not present in the test set

G-Anjanappa commented 2 years ago

Ideally there shouldn't be any predictions of such classes, however there could be false positives in the predictions. In such cases is it acceptable if this class is excluded in the calculation of overall mean Iou?

Also, when I define ignore labels variable, for example self.ignored_labels = ['stairs'] I get the following error.

Model Preparation
*****************
Done in 3.9s

Start training
**************
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/THC/THCCachingHostAllocator.cpp line=280 error=710 : device-side assert triggered
Traceback (most recent call last):
  File "train_S3DIS.py", line 331, in <module>
    trainer.train(net, training_loader, test_loader, config)
  File "/home/s2578956/Experiments/KPConv-PyTorch_working/utils/trainer.py", line 189, in train
    loss = net.loss(outputs, batch.labels)
  File "/home/s2578956/Experiments/KPConv-PyTorch_working/models/architectures.py", line 368, in loss
    self.reg_loss = p2p_fitting_regularizer(self)
  File "/home/s2578956/Experiments/KPConv-PyTorch_working/models/architectures.py", line 50, in p2p_fitting_regularizer
    distances = torch.sqrt(torch.sum((other_KP - KP_locs[:, i:i + 1, :]) ** 2, dim=2))
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

HuguesTHOMAS commented 2 years ago

In such cases is it acceptable if this class is excluded in the calculation of overall mean Iou?

No, it is not, because you are not supposed to know that there will not be any elements of the said class in the test set. Imagine you work on a real application like autonomous driving, there could be moments where there are no pedestrians around, but it does not mean that your network should not be trained to detect them. There could be some a while later.

So, you have to evaluate it anyway. If the whole test set really does not contain any point of a certain class, then it is the dataset that is not very well done, but you have to adapt to it. You can for example add a small comment in your analysis of the result saying that this class is not relevant.

when I define ignore labels variable, for example self.ignored_labels = ['stairs'] I get the following error.

self.ignored_labels should be the indices of the class, the ones defined here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/7255680ff05bdce1ba29d15a3f5ab272cb7de18d/datasets/S3DIS.py#L67-L80

so for example, if you want to ignore clutter, use self.ignored_labels = np.sort([12])