NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.47k stars 367 forks source link

Empty Sparse Tensors Cause Runtime Errors on GPU #497

Open Magnusgaertner opened 2 years ago

Magnusgaertner commented 2 years ago

Describe the bug

Constructing a sparse tensor with with 0 features on the gpu causes the first subsequent torch tensor creation on the gpu to fail. The same creation works on the second try after. This behavior is only observed on the gpu and for sparse tensors with 0 elements.


To Reproduce Steps to reproduce the behavior. If the code is not attached and cannot be reproduced easily, the bug report will be closed without any comments.

import pytest
import torch

import MinkowskiEngine as ME

def test_weird_gpu_behavior():
  """
  Reproduces a weird behavior observed when using the gpu.
  :return:
  """
  # constants
  use_gpu = True # fails only for gpu
  if use_gpu:
    device = "cuda"
    coordinate_map_type = ME.CoordinateMapType.CUDA
    allocator_type = ME.GPUMemoryAllocatorType.CUDA
    ME.set_gpu_allocator(allocator_type)
  else:
    device = "cpu"
    coordinate_map_type =ME.CoordinateMapType.CPU
    allocator_type = None

  # setup minkowski engine
  ME.set_sparse_tensor_operation_mode(ME.SparseTensorOperationMode.SHARE_COORDINATE_MANAGER)
  minkowski_algorithm = ME.MinkowskiAlgorithm.SPEED_OPTIMIZED
  num_threads = 1

  coordinate_manager = ME.CoordinateManager(D=3,
                                            num_threads=num_threads,
                                            coordinate_map_type=coordinate_map_type,
                                            minkowski_algorithm=minkowski_algorithm,
                                            allocator_type=allocator_type,
                                            )
  # some features and coordinates
  # needs to be size 0 to reproduce.
  num_points = 0
  coords = torch.zeros(size=[num_points, 3],
              dtype=torch.int32,
              device=device)
  features =  torch.zeros(size=[num_points, 3],
              dtype=torch.float32,
              device=device)
  # add batch dimension
  coords, feats = ME.utils.sparse_collate([coords], [features])

  # works
  some_tensor = torch.randn(size=[1, 4], dtype=torch.float32, device=device)

  # creating a sparse tensor without any features breaks subsequent tensor constructions on the gpu
  voxel_tensor = ME.SparseTensor(features=feats, coordinates=coords,
                                 quantization_mode=ME.SparseTensorQuantizationMode.RANDOM_SUBSAMPLE,
                                 coordinate_manager=coordinate_manager,
                                 allocator_type=allocator_type)

  with pytest.raises(RuntimeError):
    # does not work, reason unknown
    some_tensor = torch.randn(size=[1, 4], dtype=torch.float32, device=device)

  # works again
  some_tensor = torch.randn(size=[1, 4], dtype=torch.float32, device=device)

Expected behavior I would expect the above code not to raise an exception when creating some_tensor.


Desktop (please complete the following information):

(minkowski_gpu) magnus@magnus-ThinkPad-P1-Gen-4i:~$ python -c "import MinkowskiEngine as ME; ME.print_diagnostics()"
==========System==========
Linux-5.15.0-48-generic-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0]
==========Pytorch==========
1.12.0+cu116
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.85.02
CUDA Version 11.6
VBIOS Version 94.04.51.00.53
Image Version G001.0000.03.03
GSP Firmware Version N/A
==========NVCC==========
sh: 1: nvcc: not found
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11060
CUDART version MinkowskiEngine is compiled: 11060
(minkowski_gpu) magnus@magnus-ThinkPad-P1-Gen-4i:~$ wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py
==========System==========
Linux-5.15.0-48-generic-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0]
==========Pytorch==========
1.12.0+cu116
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.85.02
CUDA Version 11.6
VBIOS Version 94.04.51.00.53
Image Version G001.0000.03.03
GSP Firmware Version N/A
==========NVCC==========
sh: 1: nvcc: not found
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11060
CUDART version MinkowskiEngine is compiled: 11060

Additional context Add any other context about the problem here.