max_in limit dictionary issue

guragamb commented 3 years ago

Hi, I'm having an issue during the preprocessing step with max_in limit dictionary when training SemanticKITTI(similar to #16 and #1).

Check max_in limit dictionary
"balanced_4.000_0.060": ?
Traceback (most recent call last):
  File "train_SemanticKitti.py", line 287, in <module>
    training_sampler.calib_max_in(config, training_loader, verbose=True)
  File "/host-machine/home/guragambhalla/semKITTI/KPConv-PyTorch/datasets/SemanticKitti.py", line 952, in calib_max_in
    for batch_i, batch in enumerate(dataloader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 746, in __init__
    self._try_put_index()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
    index = self._next_index()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 339, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/host-machine/home/guragambhalla/semKITTI/KPConv-PyTorch/datasets/SemanticKitti.py", line 817, in __iter__
    class_indices = torch.cat((class_indices, new_class_inds), dim=0)
RuntimeError: Expected object of scalar type Int but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'

I know you pushed some updated code then that seemed to have resolved the situation but the code that I'm using was cloned from master recently so it should be the updated version. Thanks!

HuguesTHOMAS commented 3 years ago

mmmmh, this might come from different behaviors in new Pytorch version...

Can you print the type of tensor new_class_inds here:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/99bca4d43891f51c82c4704d040da12b5258bacf/datasets/SemanticKitti.py#L773-L774

It seems it is Long but it should be Int. I will push a correction shortly.

HuguesTHOMAS commented 3 years ago

Correction pushed. Does it solve the problem?

guragamb commented 3 years ago

That didn't resolve it unfortunately. Before you pushed the correction, new_class_inds returned the type as torch.LongTensor.

After your correction, the type (obviously) corrects to torch.IntTensor but the problem still occurs (it gets hung at "balanced_4.000_0.060": ?), you can see the output after ctrl+c at the bottom.

Data Preparation
****************

Starting Calibration of max_in_points value (use verbose=True for more details)

Previous calibration found:
Check max_in limit dictionary
"balanced_4.000_0.060": ?
^CTraceback (most recent call last):
  File "train_SemanticKitti.py", line 287, in <module>
    training_sampler.calib_max_in(config, training_loader, verbose=True)
  File "/host-machine/home/guragambhalla/semKITTI/KPConv-PyTorch/datasets/SemanticKitti.py", line 953, in calib_max_in
    for batch_i, batch in enumerate(dataloader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 746, in __init__
    self._try_put_index()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
    index = self._next_index()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 339, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/host-machine/home/guragambhalla/semKITTI/KPConv-PyTorch/datasets/SemanticKitti.py", line 816, in __iter__
    new_class_inds = torch.randperm(class_potentials.shape[0]).type(torch.int32)
KeyboardInterrupt

guragamb commented 3 years ago

Also I should add that I tried it with both Pytorch 1.4 (which was in the README) and 1.9 (latest) and the same issue occurs

HuguesTHOMAS commented 3 years ago

Ok, this is quite strange. When you say it hangs, what do you mean, there is no error, it is just stuck like in an infinite loop?

First two things to try:

Can you verify that you have set verbose =True here:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/7fdbc57f9b56b6139865ae89c3a69d62c61449b3/train_SemanticKitti.py#L285-L291

Can you try to set the number of threats to 0 to debug more easily

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/7fdbc57f9b56b6139865ae89c3a69d62c61449b3/train_SemanticKitti.py#L64-L65

Then can you print the shape of class_potentials.shape[0], just before the line that your ctrl+c showed? It could be that the permutation is on a two big number and very very slow

guragamb commented 3 years ago

When I said it hangs, it just gets stuck at "balanced_4.000_0.060": ? and I need to do ctrl+c to get out.

verbose=True is set
set input_threads = 0
When I print the shape of class_potentials, it gives 2264, 2264, 0 (only goes through the for loop on line 759 of datasets/SemanticKitti.py 3 times I'm guessing)
```
class_potentials = self.dataset.potentials[self.dataset.class_frames[i]]
print(class_potentials.shape[0]) # <---- this is the print statement I added 
```

Previous calibration found:
Check max_in limit dictionary
"balanced_4.000_0.060": ?
2264
2264
0
^CTraceback (most recent call last):
  File "train_SemanticKitti.py", line 287, in <module>
    training_sampler.calib_max_in(config, training_loader, verbose=True)
  File "/host-machine/home/guragambhalla/semKITTI/KPConv-PyTorch/datasets/SemanticKitti.py", line 952, in calib_max_in
    for batch_i, batch in enumerate(dataloader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 226, in __iter__
    for idx in self.sampler:
  File "/host-machine/home/guragambhalla/semKITTI/KPConv-PyTorch/datasets/SemanticKitti.py", line 816, in __iter__
    new_class_inds = torch.randperm(class_potentials.shape[0]).type(torch.int32)
KeyboardInterrupt

HuguesTHOMAS commented 3 years ago

Alright, I think I see where the problem comes from. If there is no representant on a class in the data, them the while loop:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/7fdbc57f9b56b6139865ae89c3a69d62c61449b3/datasets/SemanticKitti.py#L772-L774

is infinite because class_indices.shape[0] never grows. I pushed a correction this should be okay now.

chaitjo commented 3 years ago

Hi @HuguesTHOMAS, do you think the new pushes impact any other parts of the pipeline, e.g. S3DIS segmentation? (Or is it something very specific to SemanticKITTI)

HuguesTHOMAS commented 3 years ago

Hi @chaitjo,

Yesterday I pushed several corrections, including this one, but also for bugs that occur in S3DIS.

HuguesTHOMAS / KPConv-PyTorch

max_in limit dictionary issue #116