drprojects / DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Other
222 stars 24 forks source link

some issuees of LazyTensor in pykeops.torch #4

Closed ruomingzhai closed 2 years ago

ruomingzhai commented 2 years ago

Dear Dr Robert:

Thank you for your kindly sharing codes.

I encountered some issues in both PCAComputePointwise function and NeighborhoodBasedMappingFeatures function for KNN search when I run with the scannet dataset. for example, in PCAComputePointwise: if xyz_search.shape[0] > 1.6e7: xyz_query_keops = LazyTensor(xyz_query[:, None, :].double()) xyz_search_keops = LazyTensor(xyz_search[None, :, :].double()) else: xyz_query_keops = LazyTensor(xyz_query[:, None, :]) xyz_search_keops = LazyTensor(xyz_search[None, :, :]) d_keops = ((xyz_query_keops - xyz_search_keops) ** 2).sum(dim=2) neighbors = d_keops.argKmin(self.num_neighbors, dim=1) The error message is "Arg at position 1 : is not contiguous "

So I revised the xyz_query and xyz_search before knn search and it works: dtype = torch.cuda.FloatTensor if self.use_cuda else torch.FloatTensor xyz_query = xyz_query.contiguous().type(dtype) xyz_search = xyz_query.contiguous().type(dtype)

Have you ever meet this problem? I don't konw whether my revision is the right way to solve this problem. Is there another alternative solution?

drprojects commented 2 years ago

Hi @ruomingzhai, thanks for using this project !

I have not encountered the error you found but the error message and your solution seem sound.

However, I may be mistaken but I think you do not need to check the tensors' device before calling .contiguous(), you could simply do:

# K-NN search with KeOps. If the number of points is greater
# than 16 millions, KeOps requires double precision.
xyz_query = xyz_query.contiguous()
xyz_search = xyz_search.contiguous()
if xyz_search.shape[0] > 1.6e7:
    xyz_query_keops = LazyTensor(xyz_query[:, None, :].double())
    xyz_search_keops = LazyTensor(xyz_search[None, :, :].double())
else:
    xyz_query_keops = LazyTensor(xyz_query[:, None, :])
    xyz_search_keops = LazyTensor(xyz_search[None, :, :])
d_keops = ((xyz_query_keops - xyz_search_keops) ** 2).sum(dim=2)
neighbors = d_keops.argKmin(self.num_neighbors, dim=1)

Please let me know if that works for you ? If so, I will update the released code with this change.

FYI, if you have troubles with CPU-based nearest neighbor search using Keops for PCAComputePointwise and NeighborhoodBasedMappingFeatures, you can try changing the dataset's config file to use FAISS for GPU-based computation. That is to say, the opposite of what I suggested here. But to be honest, I have been having issues with FAISS for neighbor search on the GPU (which should be faster in theory), it works on some machines and not on other, and I have not had time to investigate this closely. So for now, I decided to default to Keops.

ruomingzhai commented 2 years ago

Dear Dr. Robert:

It works in your suggested way!

By the way, when I run these codes in the CPU device, another error message about the sentence, i.e., "torch.cuda.synchronize()", in MapImages class always pops up.

So I commented out these sentences and it works. I hope my revision is right and can give some helpful tips for you.

Best regards,

drprojects commented 2 years ago

Glad to hear it worked ! I will update the code right away then.

Regarding the torch.cuda.sychronize() message, could you please be more specific ? Can you share the full error/warning message and if possible, where it happens in the code ? Thanks in advance

drprojects commented 2 years ago

@ruomingzhai you can now update your code with the latest commit to solve this contiguous issue 👍

ruomingzhai commented 2 years ago

Glad to hear it worked ! I will update the code right away then.

Regarding the torch.cuda.sychronize() message, could you please be more specific ? Can you share the full error/warning message and if possible, where it happens in the code ? Thanks in advance

this error was caused by _process function in MapImage class (torch_points3d/core/data_transform/multimodal/image.py) and the full error message is : File "/root/share/code/DeepViewAgg/torch_points3d/core/data_transform/multimodal/image.py", line 240, in _process torch.cuda.synchronize() File "/root/.local/conda/envs/zrm/lib/python3.7/site-packages/torch/cuda/__init__.py", line 493, in synchronize _lazy_init() File "/root/.local/conda/envs/zrm/lib/python3.7/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

drprojects commented 2 years ago

I think this is not related to DeepViewAgg but to PyTorch in general. Some people seem to have the same problem, see this issue for instance.

I am closing this issue since the contiguous error seems to be solved.