NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.43k stars 360 forks source link

Program terminated when trying to get a kernel_map #579

Open kirilllzaitsev opened 7 months ago

kirilllzaitsev commented 7 months ago

Describe the bug A clear and concise description of what the bug is.


To Reproduce Steps to reproduce the behavior. If the code is not attached and cannot be reproduced easily, the bug report will be closed without any comments.

import MinkowskiEngine as ME
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

pruning = ME.MinkowskiPruning()
alpha = 1
x = ME.SparseTensor(
    features=torch.rand(10, 3),
    coordinates=torch.randint(0, 100, (10, 3)).int(),
    device=device,
)
y = torch.rand(10, 1)
keep = (y > alpha).squeeze().to(x.device)
out = pruning(x, keep)

cm = out.coordinate_manager

batched_target = ME.SparseTensor(
    features=torch.rand(4, 3),
    coordinates=torch.randint(0, 100, (4, 3)).int(),
    coordinate_manager=None,
    quantization_mode=ME.SparseTensorQuantizationMode.UNWEIGHTED_AVERAGE,
    device=device,
)
target_key, _ = cm.insert_and_map(batched_target.C, string_id="target")

strided_target_key = cm.stride(target_key, out.tensor_stride[0])
kernel_map = cm.kernel_map(
    out.coordinate_map_key,
    strided_target_key,
    kernel_size=1,
)

The output:

/opt/miniconda3/envs/tr/lib/python3.8/site-packages/MinkowskiEngine/__init__.py:36: UserWarning: The environment variable `OMP_NUM_THREADS` not set. MinkowskiEngine will automatically set `OMP_NUM_THREADS=16`. If you want to set `OMP_NUM_THREADS` manually, please export it on the command line before running a python script. e.g. `export OMP_NUM_THREADS=12; python your_program.py`. It is recommended to set it below 24.
  warnings.warn(
/tmp/pip-req-build-xctt5_hh/src/pruning_gpu.cu:132, (true) MinkowskiPruning: Generating an empty SparseTensor
[1]    17221 segmentation fault (core dumped)  python a.py

Expected behavior

The program should have exited normally.


Desktop (please complete the following information):

/MinkowskiEngine/MinkowskiEngine/__init__.py:36: UserWarning: The environment variable `OMP_NUM_THREADS` not set. MinkowskiEngine will automatically set `OMP_NUM_THREADS=16`. If you want to set `OMP_NUM_THREADS` manually, please export it on the command line before running a python script. e.g. `export OMP_NUM_THREADS=12; python your_program.py`. It is recommended to set it below 24.
  warnings.warn(
==========System==========
Linux-5.15.0-89-generic-x86_64-with-glibc2.31
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
==========Pytorch==========
2.1.2+cu121
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 545.23.08
CUDA Version 12.3
VBIOS Version 95.06.25.00.56
Image Version G002.0000.00.03
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda-12.3/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 12030
CUDART version MinkowskiEngine is compiled: 12030 

Additional context

My original problem is also due to the cm.kernel_map, yet in the training pipeline it results in:

0:05,  1.54it/s]/tmp/pip-req-build-xctt5_hh/src/pruning_gpu.cu:132, (true) MinkowskiPruning: Generating an empty SparseTensor
/opt/miniconda3/envs/tr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I don't know how this is related to the segmenation fault problem from above, guessing that the code breaks due to the same thing. Happy to try reproducing the exact original error if needed.

The same problem happens in Python 3.8.15 as well.