PRBonn / segcontrast

MIT License
94 stars 14 forks source link

About MinkowskiEngine/utils/quantization.py #10

Closed volare1996 closed 2 years ago

volare1996 commented 2 years ago

Hello, When I used my own data set to fine-tune the model, I got the following error.

File "downstream_train_modify2.py", line 97, in <module> trainer.fit(model_sem_kitti) File "/home/segcontrast-main/data_utils/collations.py", line 163, in __call__ coord, feats, label_ = point_set_to_coord_feats(points, label, self.resolution, self.num_points, True) File "/home/segcontrast-main/data_utils/collations.py", line 75, in point_set_to_coord_feats _, mapping = ME.utils.sparse_quantize(coordinates=p_coord, return_index=True) File "/home/segcontrast3.7/lib/python3.7/site-packages/MinkowskiEngine-0.5.4-py3.7-linux-x86_64.egg/MinkowskiEngine/utils/quantization.py", line 313, in sparse_quantize discrete_coordinates, tensor_stride, "" RuntimeError: Expected contiguous tensor, but got non-contiguous tensor for 'coordinates' (while checking arguments for initialize) in line 75: p_coord = (869016, 3) [[1.2716826e+07 9.7547080e+07 6.1090000e+03] [1.2716785e+07 9.7547008e+07 6.0720000e+03] [1.2716807e+07 9.7547008e+07 6.0810000e+03] ... [1.2723444e+07 9.7553720e+07 6.4460000e+03] [1.2723452e+07 9.7553696e+07 6.3970000e+03] [1.2723446e+07 9.7553696e+07 6.3950000e+03]]

nuneslu commented 2 years ago

Hi, you can transform p_coord into a contiguous right before the sparse_quantize function call like this:

p_coord = p_coord.contiguous()
_, mapping = ME.utils.sparse_quantize(coordinates=p_coord, return_index=True)
volare1996 commented 2 years ago
Thanks for you reply! When I modified, the following errors appeared.
    labels = labels.contiguous()
    _, mapping = ME.utils.sparse_quantize(coordinates=p_coord, return_index=True)

data_utils/collations.py", line 12, in return [ torch.from_numpy(row).float() for row in batch_data ] TypeError: expected np.ndarray (got Tensor)

nuneslu commented 2 years ago

Can you share the p_coord.shape from the Case 1? Apparently you are trying to the quantization over the full batch at once, instead you should iterate over each point cloud in the batch and do the quantization individually and then add then create a new batch with the quantized tensors (see here).

volare1996 commented 2 years ago

Thanks.

p_coord.shape= (909191, 3) I only have thirteen point cloud data to fine-tune model.

In case1, the KITTI dataset works fine.

volare1996 commented 2 years ago

My data is divided into training set and test set.The training set contains 10 point cloud data, and the test set contains 3 point cloud data.So I changed the data set class code.

class SemanticSPECTDataLoader(Dataset):
    xxx
nuneslu commented 2 years ago

Thanks.

p_coord.shape= (909191, 3) I only have thirteen point cloud data to fine-tune model.

In case1, the KITTI dataset works fine.

Is this the shape right before the _, mapping = ME.utils.sparse_quantize(coordinates=p_coord, return_index=True) call? From the error it seems that the coordinates has a dimension bigger than 2.

nuneslu commented 2 years ago

After I checked more deeply the error that you have showed I could figure it out:

File "/home/nudt/qianjia1126/20220407/segcontrast-main/trainer/semantic_kitti_trainer.py", line 96, in validation_step
x, y = numpy_to_sparse_tensor(x_coord, x_feats, x_label)
File "/home/nudt/qianjia1126/20220407/segcontrast-main/data_utils/collations.py", line 59, in numpy_to_sparse_tensor
p_label = ME.utils.batched_coordinates(array_to_torch_sequence(p_label), dtype=torch.float32)[:, 1:]
File "/home/nudt/anaconda3/envs/segcontrast3.7/lib/python3.7/site-packages/MinkowskiEngine-0.5.4-py3.7-linux-x86_64.egg/MinkowskiEngine/utils/collation.py", line 55, in batched_coordinates
).all(), "All coordinates must be in a 2D array."
AssertionError: All coordinates must be in a 2D array.

So the error is in p_label = ME.utils.batched_coordinates(array_to_torch_sequence(p_label), dtype=torch.float32)[:, 1:] which means that your p_label has shape (N,) (one dimension) and it is expecting to have two dimensions (N,1). So you should reshape your p_label with something like p_label = p_label[:,None]

volare1996 commented 2 years ago

Thanks.  When I run the eval_train.sh script, I get the following error.But No error exists after cutting the data, And I was able to pass the test with the KITTI dataset.

Traceback (most recent call last):
  File "inference_vis_modify2.py", line 169, in <module>
    run_inference(model, args)
xxx
...

RuntimeError: CUDA out of memory. Tried to allocate 418.00 MiB (GPU 0; 11.93 GiB total capacity; 10.75 GiB already allocated; 145.88 MiB free; 10.94 GiB reserved in total by PyTorch)

volare1996 commented 2 years ago

  When I ran the training script,Ubuntu16 did not support open3d==0.12.0. Is there an alternative? Thanks!

` AttributeError: 'open3d.open3d.geometry.PointCloud' object has no attribute 'select_by_index'

`

nuneslu commented 2 years ago

Thanks.  When I run the eval_train.sh script, I get the following error.But No error exists after cutting the data, And I was able to pass the test with the KITTI dataset.

Traceback (most recent call last):
  File "inference_vis_modify2.py", line 169, in <module>
    run_inference(model, args)
  File "inference_vis_modify2.py", line 90, in run_inference
    model_acc, model_miou, model_class_iou = model_pipeline(model, val_loader, args)
  File "inference_vis_modify2.py", line 50, in model_pipeline
    h = model['model'](x)
...
  File "/home/nudt/qianjia1126/20220407/segcontrast-main/models/minkunet.py", line 174, in forward
    x4 = self.stage4(x3)
...
  File "/home/nudt/qianjia1126/20220407/segcontrast-main/models/minkunet.py", line 77, in forward
    out = self.relu(self.net(x) + self.downsample(x))
...

RuntimeError: CUDA out of memory. Tried to allocate 418.00 MiB (GPU 0; 11.93 GiB total capacity; 10.75 GiB already allocated; 145.88 MiB free; 10.94 GiB reserved in total by PyTorch)

This is because of the size of your GPU or you point clouds. From the shape you have reported from you dataset, your point clouds have around 900000 points, while SemanticKITTI only has something around 100000. So you should downsample you point cloud.

nuneslu commented 2 years ago

When I ran the training script,Ubuntu16 did not support open3d==0.12.0. Is there an alternative? Thanks!

`Traceback (most recent call last): File "contrastive_train.py", line 77, in trainer.fit(model_sem_kitti) ... File "/home/nudt/qianjia1126/20220407/segcontrast-main/data_utils/datasets/SemanticKITTIDataLoader.py", line 125, in _get_augmented_item points_set = clusterize_pcd(points_set, self.n_clusters) File "/home/nudt/qianjia1126/20220407/segcontrast-main/pcd_utils/pcd_preprocess.py", line 63, in clusterizepcd pcd = pcd.select_by_index(inliers, invert=True) AttributeError: 'open3d.open3d.geometry.PointCloud' object has no attribute 'select_by_index'

`

You can try using an earlier version of Open3D, should not be a problem.

volare1996 commented 2 years ago

Thanks a lot for the explanation!

volare1996 commented 2 years ago

When I ran the pre-training script using my own data,I get the following error.

./Datasets_ply/SemanticKITTI/dataset/sequences/01/velodyne/000002.bin (2422164,) (605541, 4) (1836, 2) ./Datasets_ply/SemanticKITTI/dataset/sequences/01/velodyne/000005.bin (2620620,) (655155, 4) (667, 2) Epoch 0: 0%| | 0/8 [00:06<?, ?it/s]

` Traceback (most recent call last): File "contrastive_train.py", line 77, in trainer.fit(model_sem_kitti)

File "/home/nudt/qianjia1126/20220407/segcontrast-main/models/moco.py", line 296, in forward h_qs = list_segments_points(h_q.C, h_q.F, segments[0])

File "/home/nudt/qianjia1126/20220407/segcontrast-main/data_utils/collations.py", line 39, in list_segments_points seg_coord = torch.vstack(c_coord)

RuntimeError: vstack expects a non-empty TensorList`

some data sets cannot be clustered.

nuneslu commented 2 years ago

Here you can uncomment to see the clusters generated on your dataset. You can try to fine-tune the clustering parameters here to fit better to your dataset checking how the clusters looks like with the visualizer.

volare1996 commented 2 years ago

Thank you for your reply! Some data cannot be clustered if only coordinate attributes are considered, and additional attributes such as RGB are being considered.

volare1996 commented 2 years ago

Here you can uncomment to see the clusters generated on your dataset. You can try to fine-tune the clustering parameters here to fit better to your dataset checking how the clusters looks like with the visualizer.

When I ran the pre-training script using my own data ,I get the following error . Epoch2

Epoch 1: 12%|████████████▌ | 36/289 [02:55<20:36, 4.89s/it, loss=6.55, v_num=20]121 ./dataset/sequences/02/velodyne/000010.bin 273 ./dataset/sequences/06/velodyne/000005.bin Epoch 1: 13%|████████████▉ | 37/289 [03:00<20:31, 4.89s/it, loss=6.55, v_num=20] Epoch 2: 90%|██████████████████████████████████████████████████████████████████████████████████████████▊ | 260/289 [19:47<02:12, 4.57s/it, loss=7.1, v_num=20]332 ./dataset/sequences/07/velodyne/000022.bin 121 ./dataset/sequences/02/velodyne/000010.bin Epoch 2: 90%|██████████████████████████████████████████████████████████████████████████████████████████▊ | 260/289 [19:51<02:12, 4.58s/it, loss=7.1, v_num=20] File "/Segcontrast_modify/trainer/semantic_kitti_contrastive_trainer.py", line 67, in training_step return self.pre_training_segment_step(batch, batch_nb) if self.segment_contrast else self.pre_training_step(batch, batch_nb) File "/Segcontrast_modify/trainer/semantic_kitti_contrastive_trainer.py", line 46, in pre_training_segment_step out_seg, tgt_seg = self.forward(xi, xj, [si, sj]) File "/trainer/semantic_kitti_contrastive_trainer.py", line 33, in forward return self.moco_model(xi, xj, s) File "/models/moco.py", line 297, in forward h_qs = list_segments_points(h_q.C, h_q.F, segments[0]) File "/data_utils/collations.py", line 41, in list_segments_points seg_coord = torch.vstack(c_coord) RuntimeError: vstack expects a non-empty TensorList

Why did this error occur in epoch2?Thanks!

nuneslu commented 2 years ago

This seems to be the same problem as before, apparently you have empty segments and when listing the points it is just an empty list, which throws the error when using torch.vstack. I suggest you check the clustering parameters here and uncomment this line to visualize the clusters. I noticed that before this function to visualize_pcd_clusters was not working but I have fixed it, so you can git pull and the visualizer should work now. If you want you can also share here the point clouds after clustering.

volare1996 commented 2 years ago

Thanks for your reply! I've already visualized the data by modifying the code. Whether the data should be re-clustered for each round of training.

nuneslu commented 2 years ago

It should not be reclustered, in the code it's saved the clusters from the first epoch, so on the next epochs the clusters are all the same. My guess is that the augmentations are removing the clusters, here it's checked which clusters are still on both point clouds and just the clusters present on both point clouds are listed. You can check if after calling overlap_clusters if any cluster is still listed.

volare1996 commented 2 years ago

It should not be reclustered, in the code it's saved the clusters from the first epoch, so on the next epochs the clusters are all the same. My guess is that the augmentations are removing the clusters, here it's checked which clusters are still on both point clouds and just the clusters present on both point clouds are listed. You can check if after calling overlap_clusters if any cluster is still listed.

`print(np.unique(cluster_pi)) cluster_pi, cluster_pj = overlap_clusters(cluster_pi, cluster_pj) print(np.unique(cluster_pi))

[-1. 0. 2. 334. 396. 424. 425.] [-1. 334.]`

Epoch 0: 00/000006.bin np.unique(cluster_pi) = [-1 4 6 23 54 67 98] Epoch 10: 00/000006.bin np.unique(cluster_pi) = [-1 4 6 23 54 67 98]

The same problem arose when the model was pretrained to the fiftieth epoch.

Thanks.  When I run the eval_train.sh script, I get the following error.But No error exists after cutting the data, And I was able to pass the test with the KITTI dataset.


Traceback (most recent call last):
  File "inference_vis_modify2.py", line 169, in <module>
    run_inference(model, args)

RuntimeError: CUDA out of memory. Tried to allocate 418.00 MiB (GPU 0; 11.93 GiB total capacity; 10.75 GiB already allocated; 145.88 MiB free; 10.94 GiB reserved in total by PyTorch)

This is because of the size of your GPU or you point clouds. From the shape you have reported from you dataset, your point clouds have around 900000 points, while SemanticKITTI only has something around 100000. So you should downsample you point cloud.

I sliced the data, but the test always ran out of memory on the second data.

volare1996 commented 2 years ago

It should not be reclustered, in the code it's saved the clusters from the first epoch, so on the next epochs the clusters are all the same. My guess is that the augmentations are removing the clusters, here it's checked which clusters are still on both point clouds and just the clusters present on both point clouds are listed. You can check if after calling overlap_clusters if any cluster is still listed.

`Epoch 7: 77%|██████████████████████████████████████████████████████████████████████████████▍ | 10/13 [01:36<00:28, 9.61s/it, loss=6.49, v_num=24]25 ./data/AllSensatUrban/SemanticKITTI/dataset/sequences/00/velodyne/000028.bin 44 ./data/AllSensatUrban/SemanticKITTI/dataset/sequences/00/velodyne/000047.bin 37 ./data/AllSensatUrban/SemanticKITTI/dataset/sequences/00/velodyne/000040.bin 29 ./data/AllSensatUrban/SemanticKITTI/dataset/sequences/00/velodyne/000032.bin [ -1. 1. 7. 12. 16. 18. 24. 317. 777. 788. 811.] [-1. 7. 24.] [ -1. 38. 74. 80. 113. 145. 149. 175. 182. 183. 190. 261. 264. 281. 542.] [ -1. 145. 149. 175. 190. 261. 542.] [ -1. 0. 19. 58. 64. 75. 89. 102. 106. 116. 144. 145. 153. 159.

  1. 231.] [ -1. 0. 19. 58. 64. 75. 116. 144. 145. 153.] [ -1. 6. 95. 117. 155. 388. 417. 419. 442. 451. 454. 457. 470. 507. 940.] [-1. 6.] Epoch 7: 85%|██████████████████████████████████████████████████████████████████████████████████████▎ | 11/13 [01:46<00:19, 9.72s/it, loss=6.49, v_num=24]3 ./data/AllSensatUrban/SemanticKITTI/dataset/sequences/00/velodyne/000004.bin [ -1. 25. 30. 34. 35. 79. 100. 103. 104. 106. 138. 144. 150. 178.
  2. 190.] [-1.] Epoch 7: 92%|███████████████████████████████████████████████████████████████████████████████████████████████ | 12/13 [01:51<00:09, 9.29s/it, loss=6.5, v_num=24] Traceback (most recent call last):

In the code, `[ -1. 25. 30. 34. 35. 79. 100. 103. 104. 106. 138. 144. 150. 178.

  1. 190.] [-1.] Output from print(np.unique(cluster_pi)) cluster_pi, cluster_pj = overlap_clusters(cluster_pi, cluster_pj) print(np.unique(cluster_pi))`

Why is only one data selected in epoch 7: 85%? Thanks!

nuneslu commented 2 years ago

It is not a selection, what it does is compare both augmented point clouds and just keep the ones present on both augmented point clouds. Probably the augmentations are too severe and most of the segments are being removed from the point clouds and just a few segments stay on both point clouds.

volare1996 commented 2 years ago

Yes, The data is clustered in large chunks.Can this be fixed by deleting some data or setting batch_size ?

nuneslu commented 2 years ago

If the point clouds are being clustered into big chunks you should check the RANSAC and DBSCAN parameters. The parameters were defined taking into account the point cloud data from SemanticKITTI. The aspect of the point clouds varies quite a lot between different datasets due to the differences between the LiDARs used to collect the datasets.