HuguesTHOMAS / KPConv-PyTorch

Kernel Point Convolution implemented in PyTorch
MIT License
778 stars 155 forks source link

Vaihingen _3D data process problems #160

Open SC-shendazt opened 2 years ago

SC-shendazt commented 2 years ago

Hi Dr.HuguesTHOMAS , 2 3 The problem occurred after I changed very little code Traceback (most recent call last): File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/train_NPM3D.py", line 304, in trainer.train(net, training_loader, test_loader, config) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/utils/trainer.py", line 274, in train self.validation(net, val_loader, config) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/utils/trainer.py", line 290, in validation self.cloud_segmentation_validation(net, val_loader, config) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/utils/trainer.py", line 472, in cloud_segmentation_validation for i, batch in enumerate(val_loader): File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/datasets/NPM3D.py", line 257, in getitem return self.potential_item(batch_i) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/datasets/NPM3D.py", line 318, in potential_item cloud_ind = int(torch.argmin(self.min_potentials)) RuntimeError: cannot perform reduction function argmin on a tensor with no elements because the operation does not have an identity

QQ图片20220422135505

HuguesTHOMAS commented 2 years ago

It seems to me that your data is not loaded.

First, you need to verify the preparation function:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/datasets/NPM3D.py#L650

It is a very simple function that loads the data, preprocesses it, and saves it in the right format for the rest of the pipeline. YOu should be able to read and understand it easily. Modify the line

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/datasets/NPM3D.py#L691

with the name of any point cloud that does not contain labels. For the ones that contain labels, verify that the right name is given to read them (here it is 'class', but it could be something else): https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/datasets/NPM3D.py#L697

After this function you should have two .ply files Vaihingen3D_train.ply and Vaihingen3D_test.ply in the ply_folder.

If everything works there and the error still happens, then you should verify the data loading function:

https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/datasets/NPM3D.py#L708

In that case tell me if you need help to understand it

SC-shendazt commented 2 years ago

Hello, Dr.HuguesTHOMAS, I verified ‘def prepare_NPM3D_ply(self): ’function as you said, and generated two .ply files Vaihingen3D_train.ply and Vaihingen3D_test.ply in the train_folder.But there was a mistake at the end of round 0 4

HuguesTHOMAS commented 2 years ago

What is the error message?

SC-shendazt commented 2 years ago

The error message is the same: ![Uploading 5.png…]()

Traceback (most recent call last): File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/train_NPM3D.py", line 304, in trainer.train(net, training_loader, test_loader, config) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/utils/trainer.py", line 274, in train self.validation(net, val_loader, config) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/utils/trainer.py", line 290, in validation self.cloud_segmentation_validation(net, val_loader, config) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/utils/trainer.py", line 472, in cloud_segmentation_validation for i, batch in enumerate(val_loader): File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zt/anaconda3/envs/torch37_2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/datasets/NPM3D.py", line 257, in getitem return self.potential_item(batch_i) File "/media/zt/D/PycharmProjects/KPConv-PyTorch-master/datasets/NPM3D.py", line 318, in potential_item cloud_ind = int(torch.argmin(self.min_potentials)) RuntimeError: cannot perform reduction function argmin on a tensor with no elements because the operation does not have an identity

HuguesTHOMAS commented 2 years ago

Does Vaihingen3D_test.ply have labels or is it just a test cloud with points but no labels?

HuguesTHOMAS commented 2 years ago

Also can you show me what files are in the folder input_0.200?

SC-shendazt commented 2 years ago

yes,Vaihingen3D_test.ply have labels

SC-shendazt commented 2 years ago

7(1)

HuguesTHOMAS commented 2 years ago

Ok so the tensor self.min_potentials should be created, this is strange.\

Try to print what happens around here where min_potentials is created: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/datasets/NPM3D.py#L192-L201

I can't spend much time on this so I am not going to trace back every error with you. I'll let you investigate the mistakes on your own, when an error arrives try to print stuff to understand what is wrong and trace back where the variables come from.

SC-shendazt commented 2 years ago

Thank you for your reply. I will try to trace the cause

SC-shendazt commented 2 years ago

When I printed ‘self.min_potentials’, I found that ‘self.min_potentials’ was empty inside and that the set validation was also empty,Perhaps I have set self.validation_split incorrectly? 9 8

SC-shendazt commented 2 years ago

Hi Dr.HuguesTHOMAS, I have successfully run the Vaihingen _3D dataset, I set self.validation_split=1 and then added a '[ ]' at line 432 in the util\trainer.py module 1 2 3

HuguesTHOMAS commented 2 years ago

Nice to hear that you solve the problem! :)

Are the results looking good?

SC-shendazt commented 2 years ago

Hi Dr.HuguesTHOMAS Unfortunately, it doesn't seem to have any performance improvements, so I added EDGECONV and self -- Attention

HuguesTHOMAS commented 2 years ago

At this point because you are designing your own blocks, I can only try to guess what happens. From what I see, I think your block can only work on normal block and not on strided blocks.

Strided blocks are reducing the number of points, as you can see at the beginning of your code the if 'strided' ... portion, the q_pts (query points) are different than the s_pts (support points). Therefore when you x1 with the first KPConv, it will have a smaller dimension than x, hence the error.

I see two solutions:

a. just use a normal block for strided blocks and only use your custom block in between

b. If you want your block to work even when strided, adapt the implementation for this specific case. For example you can do the same as my ResNet block where I have a shortcut. When strided, I use a maxpooling to get the shortcut to the right amount of point before the addition: https://github.com/HuguesTHOMAS/KPConv-PyTorch/blob/3a774ff8d54a4d080fe65093b2299ede35d9735d/models/blocks.py#L641-L648

HuguesTHOMAS commented 2 years ago

I am sorry, I don't have much time to help you throughout the whole process of designing these new blocks. This is really a research subject and it goes beyond the scope of Github issues where I try to help people use the code as-is.

I can take a few minutes as I did before to answer specific questions, but this is not the right place to give me your code for reviewing. The problem you are trying to solve is interesting, but I have my own projects. Try to test, debug, and answer the questions on your own, multiply the print statements, verify the output of every operation, etc. This is how I ended up with this KPConv repo, and I am sure you will be able to create your own blocks successfully.

Best of luck Hugues

SC-shendazt commented 2 years ago

Dr.HuguesTHOMAS Thank you very much for your quick answer, I will follow your suggestions to improve this block