Closed fhartmann17 closed 4 years ago
Hi @fhartmann17,
Thx for your interest in my code.
First there is indeed a typo with the [0], you were right to add it.
Then, yes the code usually doesn't need to enter this loop. Let me explain why. We want to select, let's say, 1000 lidar scans for one epoch and we have 10 classes. For a balanced training we want to pick scans that contain all classes so here is the strategy:
in the preprocessing steps we first create self.dataset.class_frames[i]
that contains the list of scans which have at least one point of the class i
.
To be sure we don't always pick the same scans we also create self.dataset.potentials
which basically counts how many times we picked each frame of the dataset
This function here want to pick 100 scans per class, while respecting potentials so for each class, we first get the potentials of the scans that contains it: class_potentials = self.dataset.potentials[self.dataset.class_frames[i]]
Then we just pick the 100 lowest potentials for this class and if we have less than 100 scans, we enter this while loop that stacks the sames scans multiple times.
Now when I coded this I did not put safeguard here and in your case I think what happens is that class_potentials.shape[0] = 0
, which means you have one class that is not present in any of the scans. Just verify that by printing this shape.
If this is the case then I suggest you check again your data. And if there is nothing you can do about it, just get around this while loop by not choosing any points for a class that is not present
Thanks for the quick and detailed answer, @HuguesTHOMAS !!
You are right, the class_potentials.shape[0] = 0
.
The problem comes from the test_dataset
where I choose balance_classes = True
.
But still I don't understand why this error appears. It says that self.dataset.class_frames[2] = tensor([], dtype=torch.int64)
(my class 02 are motorcycles), but there are scans in the test and validation set that have motorcycles.
If your dataset is based on the implementation of the SemanticKitti dataset, the code does not load labels for the test set, because it is not supposed to know them.
You thus have two choices, you either change your sets: use your current training + validation as the new training and use the test as validation. It is easy to do and makes sense if you have the labels of the test scenes.
Or you search in the code where there is a statement
if self.set == 'test'
and you modify the code everywhere the labels are involved.
Thanks, @HuguesTHOMAS. You are right with that. I will have a look on that.
Another error that happens, if I am using balance_classes=False
and enter the else condition line 817/818 (here: SemanticKitty.py):
gen_indices = torch.randperm(self.dataset.potentials.shape[0])
then the tensor-size of gen_indices and self.dataset.epoch_inds don't match in line 825:
self.dataset.epoch_inds += gen_indices
(the size of self.dataset.epoch_inds is higher than gen_indices)
Do you know how to solve that?
Ok, the problem with epoch_inds is that it has to be accessible for all the threads of the dataloader so we have to share this tensor, which is done here:
Once shared, you cannot change the tensor size, only the data in it, which is not very convenient. This is why I do
self.dataset.epoch_inds *= 0
...
self.dataset.epoch_inds += gen_indices
instead of a simple
self.dataset.epoch_inds = gen_indices
Now you have many solutions to solve this, a simple example is to add some random indices at the end of gen_indices
to complete it so that it has the same size as self.dataset.epoch_inds.
You can also reduce the size of self.dataset.epoch_inds which is controlled by the parameter config.epoch_steps
for the training dataset and config.validation_size
for the validation and test sets. See here:
Thanks for your help @HuguesTHOMAS !!
I will close this issue now. Stay safe and good luck for you.
Hi @HuguesTHOMAS,
thanks for making your code open-source. I am currently trying to train KPConv on my own Dataset which is in "kitti-format".
But in the last step of each epoch I get stuck in the while loop in SemanticKitty.py line 772-774.
(I added the [0] in the while condition)
Usually it shouldn't even enter this part at that stage, should it? Do you maybe know what my problem is and how to solve it?