Closed G-Anjanappa closed 1 year ago
Hi @G-Anjanappa,
The first thing I have to say is: there is something wrong with what you are doing from a scientific viewpoint.
For example, I am subsampling ceiling, floor, and a few other classes using grid_subsampling() with sampleDl = 0.02 and keeping the original point density for some classes to provide the network more information about these classes
This is wrong, you cannot use groundtruth information to alter the data, because when you are going to test the network on real test data (without groundtruth), you will not be able to do this partial subsampling. Your goal is to predict classes but you use these classes to prepare your data, so that's impossible.
Now to answer your question here is what you can do:
Chose between the two following strategies:
a) You chose a small value for first_subsampling_dl
like 0.01. To keep details in your small objects. As a consequence, you need to reduce in_radius
accordingly, otherwise your network will demand a huge amount of memory and time. This is not a problem in you case because you are focusing on small object, the network does not need large input spheres because it does not care about large objects like chairs or tables. Perform some tests to find the best values of first_subsampling_dl
and in_radius
in your own case. My advice is usually to keep a ratio bellow 50 between the two values, to have a reasonable network size.
b) You do not subsample the input point clouds. You can modify the code here to do that:
https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/73e444d486cd6cb56122c3dd410e51c734064cfe/datasets/S3DIS.py#L738-L749
In that case, you still need to choose the first_subsampling_dl
value as it controls the convolution size, the subsampling size of the next layers, etc. The same goes for in_radius
and the ratio between the two like I said in the previous point.
I advise that you chose option a), because if the input data is not subsampled, you have no control over the number of neighbor points in the convolutions and can end up with OOM errors. Furthermore, it is very unlikely that adding more points in convolutions will help. The ratio of 2.5 that we use is already big enough to get info with 15 kernel points.
Hello Thomas,
Thank you for your quick response and suggestions. I am testing the network with your suggested parameters.
Meanwhile, there are a few things which are not very clear for me,
I performed the sub-sampling based on labels for the S3DIS dataset as a step for data preparation before the training or testing. This was mainly to balance out the amount of data for each class to allow the network to learn; otherwise, certain classes like walls/floor have more support than exit signs/fire alarms. Even with a minimal sampling value like 0.01, the data for small objects aren't very representative. As they are further downsampled, the information is lost, and other classes like ceiling, floor, furniture, and walls still have higher point density. Another possible idea was to add more of the objects of my interest into the scene, this would again be manipulating data for training but to make it more informative??
There is a flag in the code, use_potentials, for class unbalance problems in the dataset. Does this flag relevant in my case? (Please correct me if my understanding is wrong)
Of course, this kind of sub-sampling would not be possible for a new dataset. But that is the aim of the experiment. I want to see how the model trained on the dataset from (1) predicts on a new dataset without subsampling.
Thank you in advance for your time. All your suggestions are very much appreciated.
Regards, Geethanjali
@G-Anjanappa ,
So to answer your questions.
- I performed the sub-sampling based on labels for the S3DIS dataset as a step for data preparation before the training or testing.
You can do whatever you want with training data, as long as you don't use the labels for test data (because you are not supposed to know them) In fact in most online benchmarks dataset (like scannet for example). You don't have access to the test labels. So you can do this partial subsampling on the training data if you want, but you must not do it on the test data. And therefore, it does not make any sense to do it on the training data anymore, as the network would be trained on data that is different from your test data (Because I can tell you the network will be able to detect the change in density and use that a feature to detect your objects, and then will not work well on your test data)
Your second strategy is called data augmentation and is totally acceptable because you only do it on the training data to help the network see more small objects. You do not alter the test data. It should be in fact a great strategy in my opinion for your task.
- There is a flag in the code, use_potentials, for class unbalance problems in the dataset. Does this flag relevant in my case?
It is indeed, when using use_potentials=True
, the training will take input spheres regularly across the dataset. And when using use_potentials=False
, for each class, the training will take the same amount (let it be N) spheres centered on an object of this class.
In your case, use_potentials=False
should be helpful as it helps detect the classes that are in minority in the data.
- Of course, this kind of sub-sampling would not be possible for a new dataset. But that is the aim of the experiment. I want to see how the model trained on the dataset from (1) predicts on a new dataset without subsampling.
Well, if you want to train on a dataset partially subsampled and test on a dataset that is not, my opinion is that you will not get the best results as I told you earlier, because the network will surely use the difference in density to detect your objects. But you are totally free to test it, as long as you do not partially subsample the test set, it is a scientifically valid approach.
Thank you for your answers, @HuguesTHOMAS .
Could you please elaborate on the network detecting the change in density and using that as a feature to detect the objects? It would be helpful and interesting to understand this.
Also, when I try to use use_potentials=False
I get a runtime error as mentioned in #98. I did as you have asked in that issue to not use use_potentials=False
for the validation set. However, I still have the issue. Could you please help me out?
Could you please elaborate on the network detecting the change in density and using that as a feature to detect the objects? It would be helpful and interesting to understand this.
There is not much to say, the network creates features from the data and it will probably get features measuring the density (a convolution with all weights equal to one does that for example).
Yes, you should always set use_potentials=True
for the validation (same reason as before you are not supposed to have the classes on test data)
What is your exact error message?
I am using the following code:
# Initialize datasets
training_dataset = S3DISDataset(config, set='training', use_potentials=False)
test_dataset = S3DISDataset(config, set='validation', use_potentials=True)
I am getting the below error:
Traceback (most recent call last):
File "/home/s2578956/Experiments/KPConv-PyTorch/train_S3DIS.py", line 293, in <module>
training_sampler.calibration(training_loader, verbose=True)
File "/home/s2578956/Experiments/KPConv-PyTorch/datasets/S3DIS.py", line 1408, in calibration
for batch_i, batch in enumerate(dataloader):
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
return self._get_iterator()
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 944, in __init__
self._reset(loader, first_iter=True)
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 975, in _reset
self._try_put_index()
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1209, in _try_put_index
index = self._next_index()
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/home/s2578956/anaconda3/envs/KPConv_pytorch/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 229, in __iter__
for idx in self.sampler:
File "/home/s2578956/Experiments/KPConv-PyTorch/datasets/S3DIS.py", line 1157, in __iter__
self.dataset.epoch_inds += torch.from_numpy(all_epoch_inds[:, :num_centers])
RuntimeError: The size of tensor a (3000) must match the size of tensor b (2961) at non-singleton dimension 1
I think the error comes from the fact that you did not have enough training points for one of your classes. It is a case I never encountered before, but I made a correction for it and it should work now with the new version of the code.
can you test it and see if it works? I did not test it on my computer yet.
Hi @HuguesTHOMAS
With the changes you provided, I am not encountering the error anymore. Thank you very much for the quick response.
The problem cited in #2 looks pretty similar to what I intend to do. The proposition made by the user to combine both potential and class balancing for training data selection seems interesting. Do you think that distributing potentials throughout the point cloud based on classes would improve the possibility of predicting the minority classes? Could you please share your thoughts on this?
Also, you mentioned in the same discussion that the parameter class_w
controls the weight of each class in the loss. How do we decide the values for this for each class?
Thank you in advance.
Well at this point this is a valid research problem, I'll let you explore it and find the answers yourself.
class_w is used here: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3d683b6bd6bf058135d3f9f155cd41595dc81c16/models/architectures.py#L306-L311
and you can set your own custom in the Config class of the training script as I do for SemanticKitti https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/e600c1667d085aeb5cf89d8dbe5a97aad4270d88/train_SemanticKitti.py#L187-L188
It is just a list of weights (one for each class). I did not experiment a lot with it, but I can tell you this will not magically solve your issues. Other strategies like data augmentation may be more promising.
Good luck with your research.
Hi @HuguesTHOMAS,
The predictions on minority classes improved with the suggestions in the previous comments, though I have to test with data augmentation yet. Thank you for the support.
I have another question. Suppose the training data had 12 semantic classes but the test data only 11 classes. Could you please help me understand how the network handles such cases? How IoUs or other metrics are calculated in such a case?
Suppose the training data had 12 semantic classes but the test data only 11 classes. Could you please help me understand how the network handles such cases? How IoUs or other metrics are calculated in such a case?
It is impossible to have a different number of classes during training and test. If the network is trained to predict 12 classes, it will predict 12 classes. There is one thing that can happen: some points could not have a class or could belong to classes that should be ignored by the network. Let's say we have 12 classes including 1 that is not relevant, in that case, we train the network only on 11 classes and therefore test it on 11 classes.
It is the case for example in the semanticKitti dataset. In this case, we add this irrelevant class to the list of ignored classes:
Then what will happen is that when we apply the loss of the network to the semantic prediction for each point, we ignore the points of this class, and only apply the loss to the relevant points. Therefore, for the network, only the relevant classes exist.
Hi @HuguesTHOMAS ,
Thank you for the response.
I understand the second case. But what happens suppose the test set doesn't have points belonging to a class that was trained upon? We wouldn't know this prior to define the labels to ignore.
But what happens suppose the test set doesn't have points belonging to a class that was trained upon? We wouldn't know this prior to define the labels to ignore.
If you don't know that the class will not be present in the test set, you should just train your network on all classes. If the network performs well, you will not have meant prediction of the class that is not present in the test set
Ideally there shouldn't be any predictions of such classes, however there could be false positives in the predictions. In such cases is it acceptable if this class is excluded in the calculation of overall mean Iou?
Also, when I define ignore labels variable, for example self.ignored_labels = ['stairs']
I get the following error.
Model Preparation
*****************
Done in 3.9s
Start training
**************
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/THC/THCCachingHostAllocator.cpp line=280 error=710 : device-side assert triggered
Traceback (most recent call last):
File "train_S3DIS.py", line 331, in <module>
trainer.train(net, training_loader, test_loader, config)
File "/home/s2578956/Experiments/KPConv-PyTorch_working/utils/trainer.py", line 189, in train
loss = net.loss(outputs, batch.labels)
File "/home/s2578956/Experiments/KPConv-PyTorch_working/models/architectures.py", line 368, in loss
self.reg_loss = p2p_fitting_regularizer(self)
File "/home/s2578956/Experiments/KPConv-PyTorch_working/models/architectures.py", line 50, in p2p_fitting_regularizer
distances = torch.sqrt(torch.sum((other_KP - KP_locs[:, i:i + 1, :]) ** 2, dim=2))
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
In such cases is it acceptable if this class is excluded in the calculation of overall mean Iou?
No, it is not, because you are not supposed to know that there will not be any elements of the said class in the test set. Imagine you work on a real application like autonomous driving, there could be moments where there are no pedestrians around, but it does not mean that your network should not be trained to detect them. There could be some a while later.
So, you have to evaluate it anyway. If the whole test set really does not contain any point of a certain class, then it is the dataset that is not very well done, but you have to adapt to it. You can for example add a small comment in your analysis of the result saying that this class is not relevant.
when I define ignore labels variable, for example self.ignored_labels = ['stairs'] I get the following error.
self.ignored_labels
should be the indices of the class, the ones defined here:
https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/7255680ff05bdce1ba29d15a3f5ab272cb7de18d/datasets/S3DIS.py#L67-L80
so for example, if you want to ignore clutter, use self.ignored_labels = np.sort([12])
Hello Thomas,
Thank you for the open-sourced code. I am using your network to model safety-related assets like lights, exit signs, and fire alarms in an indoor scene. For this application, I intend to use the S3DIS data with conditional subsampling before writing them into .ply files in the prepare_S3DIS_ply() function. For example, I am subsampling ceiling, floor, and a few other classes using grid_subsampling() with sampleDl = 0.02 and keeping the original point density for some classes to provide the network more information about these classes (as exit signs and fire alarms are very few in number). This will result in varying point densities throughout the area.
My questions now are:
I do not want to subsample again while loading the .ply files before starting the training, as I have partly subsampled the data before. Is that okay?
If I implement case-1, first_subsampling_dl will be 0 (technically). But I notice that the method calibration() in the S3DIS.py file uses the first_subsampling_dl variable to assess neighbor limits. My question here is, can I use an arbitrary value, say first_subsampling_dl = 0.01 here? I don't clearly understand the effect of this on the network's performance.
I have implemented these modifications already and tested them with the below parameters.
**### Input parameters*** num_classes = 13 in_radius = 1.200000 num_kernel_points = 15 deform_radius = 4.000000 KP_extent = 1.200000 segloss_balance = class
I have got pretty decent results for all 13 classes. But I wanted to be sure if I was proceeding correctly. Could you please help me with this?
Thank you in advance.
Regards, Geethanjali