RuntimeError & Machine Type Inquery

Mollylulu commented 2 years ago

Running pre-collate on 3D data...
Traceback (most recent call last):
  File "s3dis_vis.py", line 100, in <module>
    dataset = S3DISFusedDataset(cfg.data)
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 767, in __init__
    self.train_dataset = S3DISSphereMM(
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 596, in __init__
    super().__init__(root, *args, **kwargs)
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 178, in __init__
    super(S3DISOriginalFusedMM, self).__init__(
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 56, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 87, in __init__
    self._process()
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 170, in _process
    self.process()
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 655, in process
    super().process()
  File "/xxx/torch_points3d/datasets/segmentation/multimodal/s3dis.py", line 418, in process
    data_list = self.pre_collate_transform(data_list)
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 19, in __call__
    data = [transform(d) for d in data]
  File "/home/xxx/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 19, in <listcomp>
    data = [transform(d) for d in data]
  File "/xxx/torch_points3d/core/data_transform/features.py", line 541, in __call__
    data = self._process(data)
  File "/xxx/torch_points3d/core/data_transform/features.py", line 500, in _process
    neighbors = nn_finder(xyz_search, xyz_query, None, None)
  File "/xxx/torch_points3d/core/spatial_ops/neighbour_finder.py", line 17, in __call__
    return self.find_neighbours(x, y, batch_x, batch_y)
  File "/xxx/torch_points3d/core/spatial_ops/neighbour_finder.py", line 263, in find_neighbours
    return torch.LongTensor(gpu_index_flat.search(y_np, k)[1]).to(x.device)
  File "/xxx/lib/python3.8/site-packages/faiss/__init__.py", line 322, in replacement_search
    self.search_c(n, swig_ptr(x), k, swig_ptr(D), swig_ptr(I))
  File "/xxx/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 9009, in search
    return _swigfaiss_avx2.GpuIndex_search(self, n, x, k, distances, labels)
RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at
/root/miniconda3/conda-bld/faiss-pkg_1639741185190/work/faiss/gpu/StandardGpuResources.cpp:452:
Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryOverflow dev 0
space Device stream 0x558ecfc66c70 size 22479120128 bytes (cudaMalloc error out of memory [2])

Hi, I run the s3dis_visualization.ipynb under notebooks for s3dis dataset. It seems need a huge memory for both CPU & GPU. And I got this OOM error, which hints that it requires over 20G GPU to preprocess the data. 😢 Therefore, I wonder know the machine type of yours as a reference. And this preprcossing looks like not memory-friendly, Is there any way to walk around this 20+G GPU memory requirement. Thanks! and looking forward to your help.

drprojects commented 2 years ago

Hi, thanks for using this repo and for the feedback !

Indeed you seem to be encountering issues with GPU-accelerated nearest neighbor search using FAISS. It is a problem I have not solved yet, but for the meantime, you can try doing this step on the CPU instead.

To this end, please set use_faiss: False in conf/data/segmentation/multimodal/s3disfused-sparse.yaml :

    - transform: PCAComputePointwise
      params:
            num_neighbors: 50  # heuristic: at least 30
            # r: 0.1  # heuristic: 2 * voxel - using r will force CPU computation
            # use_full_pos: True  # Possible if GridSampling3D.setattr_full_pos = True
            use_faiss: False

This will move the neighbor computation on CPU using KEOPS.

In any case, this preprocessing step will always be quite memory-hungry, even on the CPU. So I recommend you do not have any another important tasks running on your machine when you start preprocessing the datasets.

FYI I have 64G of RAM and 32G of GPU on my machine and have not tested this project with less memory. If you do not have access to a 30+G GPU, you will be able to run inference from pretrained models but training large multimodal models may be tricky. If you run into this problem, please let me known in a separate issue, I may have some tricks to help.

Please let me know how that goes !

Mollylulu commented 2 years ago

well noted, thank you for your kind help 🌹

drprojects commented 2 years ago

Sure ! Please let me know if you managed to preprocess and train as you wanted :wink:

drprojects commented 2 years ago

Hello @Mollylulu, have you succeeded in running the preprocessing on S3DIS ?

drprojects commented 2 years ago

Closing this issue since I think the new default config with CPU preprocessing should solve this

drprojects / DeepViewAgg

RuntimeError & Machine Type Inquery #3