RuntimeError: nms is not compiled with GPU support

laijirong commented 2 weeks ago

Thank you for developing such an amazing tool, I followed the instructions and installed it on our lab's cluster.However when it run for about one day, it quit with such error RuntimeError: nms is not compiled with GPU support. I am sure that I've installed all the requirements(followed by the steps), and checked it on the search engine, but the problem still exists.

python mmdet/utils/collect_env.py
sys.platform: linux
Python: 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03) [GCC 11.3.0]
CUDA available: True
GPU 0,1: NVIDIA A800 80GB PCIe
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Build cuda_11.8.r11.8/compiler.31833905_0
GCC: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
PyTorch: 1.10.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.2
OpenCV: 4.7.0
MMCV: 1.3.17
MMCV Compiler: GCC 8.5
MMCV CUDA Compiler: not available
MMDetection: 2.11.0+6d828f5

Some of the logs looks like below:

tail -n 100 ../autohic.lg
│ │                │   │   [ 782.1371,  757.5547,  800.0000,  783.6221],     │ │
│ │                │   │   ...,                                              │ │
│ │                │   │   [3592.5667, 3727.0974, 4004.0000, 4004.0000],     │ │
│ │                │   │   [3710.6604, 3717.9309, 4004.0000, 4004.0000],     │ │
│ │                │   │   [3788.7009, 3578.1282, 4004.0000, 4004.0000]],    │ │
│ │                device='cuda:0'),                                         │ │
│ │                │   tensor([0.5039, 0.4382, 0.3605,  ..., 0.0061, 0.0042, │ │
│ │                0.0035], device='cuda:0')                                 │ │
│ │                )                                                         │ │
│ │    args_info = FullArgSpec(                                              │ │
│ │                │   args=[                                                │ │
│ │                │   │   'boxes',                                          │ │
│ │                │   │   'scores',                                         │ │
│ │                │   │   'iou_threshold',                                  │ │
│ │                │   │   'offset',                                         │ │
│ │                │   │   'score_threshold',                                │ │
│ │                │   │   'max_num'                                         │ │
│ │                │   ],                                                    │ │
│ │                │   varargs=None,                                         │ │
│ │                │   varkw=None,                                           │ │
│ │                │   defaults=(0, 0, -1),                                  │ │
│ │                │   kwonlyargs=[],                                        │ │
│ │                │   kwonlydefaults=None,                                  │ │
│ │                │   annotations={}                                        │ │
│ │                )                                                         │ │
│ │     cls_name = None                                                      │ │
│ │ dst_arg_name = 'iou_threshold'                                           │ │
│ │    func_name = 'nms'                                                     │ │
│ │       kwargs = {'iou_threshold': 0.7}                                    │ │
│ │    name_dict = {'iou_thr': 'iou_threshold'}                              │ │
│ │     old_func = <function nms at 0x154cd8fa3670>                          │ │
│ │ src_arg_name = 'iou_thr'                                                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /share/home/****/miniconda3/envs/autohic/lib/python3.9/site-packages/mmc │
│ v/ops/nms.py:171 in nms                                                      │
│                                                                              │
│   168 │   │   }                                                              │
│   169 │   │   inds = ext_module.nms(*indata_list, **indata_dict)             │
│   170 │   else:                                                              │
│ ❱ 171 │   │   inds = NMSop.apply(boxes, scores, iou_threshold, offset,       │
│   172 │   │   │   │   │   │      score_threshold, max_num)                   │
│   173 │   dets = torch.cat((boxes[inds], scores[inds].reshape(-1, 1)), dim=1 │
│   174 │   if is_numpy:                                                       │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │           boxes = tensor([[ 764.0522,  781.6668,  789.8285,  800.0000],  │ │
│ │                   │   │   [ 764.0328,  783.4024,  787.0239,  800.0000],  │ │
│ │                   │   │   [ 782.1371,  757.5547,  800.0000,  783.6221],  │ │
│ │                   │   │   ...,                                           │ │
│ │                   │   │   [3592.5667, 3727.0974, 4004.0000, 4004.0000],  │ │
│ │                   │   │   [3710.6604, 3717.9309, 4004.0000, 4004.0000],  │ │
│ │                   │   │   [3788.7009, 3578.1282, 4004.0000, 4004.0000]], │ │
│ │                   device='cuda:0')                                       │ │
│ │   iou_threshold = 0.7                                                    │ │
│ │        is_numpy = False                                                  │ │
│ │         max_num = -1                                                     │ │
│ │          offset = 0                                                      │ │
│ │ score_threshold = 0                                                      │ │
│ │          scores = tensor([0.5039, 0.4382, 0.3605,  ..., 0.0061, 0.0042,  │ │
│ │                   0.0035], device='cuda:0')                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /share/home/****/miniconda3/envs/autohic/lib/python3.9/site-packages/mmc │
│ v/ops/nms.py:26 in forward                                                   │
│                                                                              │
│    23 │   │   │   valid_inds = torch.nonzero(                                │
│    24 │   │   │   │   valid_mask, as_tuple=False).squeeze(dim=1)             │
│    25 │   │                                                                  │
│ ❱  26 │   │   inds = ext_module.nms(                                         │
│    27 │   │   │   bboxes, scores, iou_threshold=float(iou_threshold), offset │
│    28 │   │                                                                  │
│    29 │   │   if max_num > 0:                                                │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │                bboxes = tensor([[ 764.0522,  781.6668,  789.8285,        │ │
│ │                         800.0000],                                       │ │
│ │                         │   │   [ 764.0328,  783.4024,  787.0239,        │ │
│ │                         800.0000],                                       │ │
│ │                         │   │   [ 782.1371,  757.5547,  800.0000,        │ │
│ │                         783.6221],                                       │ │
│ │                         │   │   ...,                                     │ │
│ │                         │   │   [3592.5667, 3727.0974, 4004.0000,        │ │
│ │                         4004.0000],                                      │ │
│ │                         │   │   [3710.6604, 3717.9309, 4004.0000,        │ │
│ │                         4004.0000],                                      │ │
│ │                         │   │   [3788.7009, 3578.1282, 4004.0000,        │ │
│ │                         4004.0000]], device='cuda:0')                    │ │
│ │                   ctx = <torch.autograd.function.NMSopBackward object at │ │
│ │                         0x154cbb2ade40>                                  │ │
│ │         iou_threshold = 0.7                                              │ │
│ │ is_filtering_by_score = False                                            │ │
│ │               max_num = -1                                               │ │
│ │                offset = 0                                                │ │
│ │       score_threshold = 0                                                │ │
│ │                scores = tensor([0.5039, 0.4382, 0.3605,  ..., 0.0061,    │ │
│ │                         0.0042, 0.0035], device='cuda:0')                │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: nms is not compiled with GPU support

Looking forward for your reply.

Jwindler commented 2 weeks ago

maybe you can refer: #9

laijirong commented 2 weeks ago

maybe you can refer: #9

Thanks for replying! I checked my environment, and installed extra components, so that I have the nvcc version 11.3 installed, however it still not work.

>>> print(torch.__version__)
1.10.1
>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
11.3
>>> print(torch.backends.cudnn.version())
8200

The cuda components is show belowed

python -c "import torch.utils.cpp_extension;print(torch.utils.cpp_extension.CUDA_HOME)"
/share/home/***/miniconda3/envs/autohic

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

conda list | grep cu
cuda                      11.3.0               h3b286be_0    nvidia
cuda-command-line-tools   11.3.0               h3b286be_0    nvidia
cuda-compiler             11.3.0               h3b286be_0    nvidia
cuda-cudart               11.3.58              hc1aae59_0    nvidia
cuda-cuobjdump            11.3.58              hc78e225_0    nvidia
cuda-cupti                11.3.58              h9a3dd33_0    nvidia
cuda-cuxxfilt             11.3.58              he670d9e_0    nvidia
cuda-gdb                  11.3.58              h531059a_0    nvidia
cuda-libraries            11.3.0               h3b286be_0    nvidia
cuda-libraries-dev        11.3.0               h3b286be_0    nvidia
cuda-memcheck             11.8.86                       0    nvidia
cuda-nvcc                 11.3.58              h2467b9f_0    nvidia
cuda-nvdisasm             11.3.58              hd2ea46e_0    nvidia
cuda-nvml-dev             12.4.127                      0    nvidia
cuda-nvprof               11.3.58              h860cd9e_0    nvidia
cuda-nvprune              11.3.58              hb917323_0    nvidia
cuda-nvrtc                11.3.58              he300756_0    nvidia
cuda-nvtx                 11.3.58              h3fa534a_0    nvidia
cuda-nvvp                 11.3.58              hd16380c_0    nvidia
cuda-runtime              11.3.0               h3b286be_0    nvidia
cuda-samples              11.6.101             h8efea70_0    nvidia
cuda-sanitizer-api        11.3.58              h58da6c8_0    nvidia
cuda-thrust               11.4.43              h00096a5_0    nvidia
cuda-toolkit              11.3.0               h3b286be_0    nvidia
cuda-tools                11.3.0               h3b286be_0    nvidia
cuda-visual-tools         11.3.0               h3b286be_0    nvidia
cudatoolkit               11.3.1              h9edb442_11    conda-forge
cudnn                     8.2.1                cuda11.3_0    defaults
libcublas                 12.4.5.8                      0    nvidia
libcufft                  11.2.1.3                      0    nvidia
libcurand                 10.3.5.147                    0    nvidia
libcusolver               11.6.1.9                      0    nvidia
libcusparse               12.3.1.170                    0    nvidia
ncurses                   6.4                  h6a678d5_0    defaults
pytorch                   1.10.1          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.10.1               py39_cu113    pytorch
torchvision               0.11.2               py39_cu113    pytorch

Btw, as you mentioned in the #9, how can I switch to CPU method, I didn't find instructions in the docs, forgiving my careless. Thank you so much!

Jwindler commented 2 weeks ago

If you have a GPU and want to use it, you must install CUDA-11.3 and cuDNN-8.2 before. The easiest way to use the CPU is to install it on a machine without a GPU or use Docker

laijirong commented 2 weeks ago

If you have a GPU and want to use it, you must install CUDA-11.3 and cuDNN-8.2 before. The easiest way to use the CPU is to install it on a machine without a GPU or use Docker

Thanks for replying! I'am sure that CUDA-11.3 and cuDNN-8.2 are installed, though they were installed by conda instead of package-manager or the run package, or if you mean that I have to install these two packages in an existing environment first, then install these dependencies manually.😂 And I will try to install AutoHiC on another GPU-less machine. Thank you so much for helping!🫡

Jwindler commented 2 weeks ago

Yes, you must install without conda for CUDA-11.3 and cuDNN-8.2. If using CPU is acceptable to you, it is recommended.

Jwindler / AutoHiC

RuntimeError: nms is not compiled with GPU support #40