Can't install Detectron2

Instructions To Reproduce the 🐛 Bug:

Full runnable code or full changes you made:

conda create -n detectronTestNew python=3.8.10
conda activate detectronTestNew
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Full logs or other relevant observations:

(downloads and copies done before are ommited)
  copying detectron2/model_zoo/configs/COCO-Detection/retinanet_R_50_FPN_1x.py -> build/lib.linux-x86_64-cpython-38/detectron2/model_zoo/configs/COCO-Detection
  copying detectron2/model_zoo/configs/COCO-Detection/fcos_R_50_FPN_1x.py -> build/lib.linux-x86_64-cpython-38/detectron2/model_zoo/configs/COCO-Detection
  running build_ext
  building 'detectron2._C' extension
  creating build/temp.linux-x86_64-cpython-38
  creating build/temp.linux-x86_64-cpython-38/tmp
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/ROIAlignRotated
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/box_iou_rotated
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval
  creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/nms_rotated
  gcc -pthread -B /home/pedrobolsa/anaconda3/envs/detectronTestNew/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/TH -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/THC -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/include/python3.8 -c /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cpu.cpp -o build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
  cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  gcc -pthread -B /home/pedrobolsa/anaconda3/envs/detectronTestNew/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/TH -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/THC -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/include/python3.8 -c /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.cpp -o build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
  cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  gcc -pthread -B /home/pedrobolsa/anaconda3/envs/detectronTestNew/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/TH -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/THC -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/include/python3.8 -c /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.cpp -o build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
  cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  In file included from /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:45,
                   from /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/numpy.h:12,
                   from /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.h:4,
                   from /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.cpp:2:
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:199:10: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
    199 |     std::uint16_t nargs;
        |          ^~~~~~~~
        |          wint_t
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:202:10: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
    202 |     std::uint16_t nargs_kw_only = 0;
        |          ^~~~~~~~
        |          wint_t
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:205:10: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
    205 |     std::uint16_t nargs_pos_only = 0;
        |          ^~~~~~~~
        |          wint_t
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h: In constructor ‘pybind11::detail::function_call::function_call(const pybind11::detail::function_record&, pybind11::handle)’:
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:310:20: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
    310 |     args.reserve(f.nargs);
        |                    ^~~~~
        |                    args
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:311:28: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
    311 |     args_convert.reserve(f.nargs);
        |                            ^~~~~
        |                            args
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h: In function ‘void pybind11::detail::process_kw_only_arg(const pybind11::arg&, function_record*)’:
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:382:10: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
    382 |     ++r->nargs_kw_only;
        |          ^~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h: In static member function ‘static void pybind11::detail::process_attribute<pybind11::pos_only>::init(const pybind11::pos_only&, pybind11::detail::function_record*)’:
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:439:12: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
    439 |         r->nargs_pos_only = static_cast<std::uint16_t>(r->args.size());
        |            ^~~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:439:46: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
    439 |         r->nargs_pos_only = static_cast<std::uint16_t>(r->args.size());
        |                                              ^~~~~~~~
        |                                              wint_t
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In member function ‘void pybind11::cpp_function::initialize_generic(unique_function_record&&, const char*, const std::type_info* const*, pybind11::size_t)’:
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:311:26: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
    311 |                 if (rec->nargs_kw_only > 0 && arg_index + rec->nargs_kw_only == args)
        |                          ^~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:311:64: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
    311 |                 if (rec->nargs_kw_only > 0 && arg_index + rec->nargs_kw_only == args)
        |                                                                ^~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:329:26: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
    329 |                 if (rec->nargs_pos_only > 0 && (arg_index + 1) == rec->nargs_pos_only)
        |                          ^~~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:329:72: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
    329 |                 if (rec->nargs_pos_only > 0 && (arg_index + 1) == rec->nargs_pos_only)
        |                                                                        ^~~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:371:14: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
    371 |         rec->nargs = (std::uint16_t) args;
        |              ^~~~~
        |              args
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:371:28: error: ‘uint16_t’ is not a member of ‘std’; did you mean ‘wint_t’?
    371 |         rec->nargs = (std::uint16_t) args;
        |                            ^~~~~~~~
        |                            wint_t
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In static member function ‘static PyObject* pybind11::cpp_function::dispatcher(PyObject*, PyObject*, PyObject*)’:
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:604:40: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
    604 |                 size_t num_args = func.nargs;    // Number of positional arguments that we need
        |                                        ^~~~~
        |                                        args
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:607:51: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
    607 |                 size_t pos_args = num_args - func.nargs_kw_only;
        |                                                   ^~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:657:40: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
    657 |                 if (args_copied < func.nargs_pos_only) {
        |                                        ^~~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:658:47: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
    658 |                     for (; args_copied < func.nargs_pos_only; ++args_copied) {
        |                                               ^~~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:672:44: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
    672 |                     if (args_copied < func.nargs_pos_only)
        |                                            ^~~~~~~~~~~~~~
  /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:760:53: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
    760 |                     second_pass_convert.resize(func.nargs, false);
        |                                                     ^~~~~
        |                                                     args
  error: command '/usr/bin/gcc' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for detectron2
Running setup.py clean for detectron2
Failed to build detectron2
ERROR: Could not build wheels for detectron2, which is required to install pyproject.toml-based projects

Environment:

Provide your environment information using the following command:

-------------------------------  ------------------------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.10 (default, Jun  4 2021, 15:09:15) [GCC 7.5.0]
numpy                            1.24.3
detectron2                       failed to import
detectron2._C                    not built correctly: No module named 'detectron2'
Compiler ($CXX)                  c++ (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.10.0 @/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 3080 Laptop GPU (arch=8.6)
Driver version                   530.41.03
CUDA_HOME                        None - invalid!
Pillow                           8.2.0
torchvision                      0.11.0 @/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torchvision
torchvision arch flags           /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torchvision/_C.so
cv2                              Not found
-------------------------------  ------------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

Hope someone can help me out. Thanks!

Hi @pfcouto

There seems to be an issue with your PyTorch installation. You can follow these steps to install detectron2 within your environment constraints(like python 3.8.10, CUDA 11.3 and linux platform). I hope this resolves your issue, if you face any further issue even after following these steps, let me know.

Install pyenv from their repository. Its a great tool to maintain different python versions
Install python 3.8.10 version using pyenv
```
pyenv install 3.8.10
```
Set python 3.8.10 as python version for current terminal session using
```
pyenv shell 3.8.10
```

create a new virtual env using and activate it

python -m venv env && source ./env/bin/activate

check your python version and path using
```
python --version && which python
```
Now install torch and torch vision using pip. I suggest using torch 1.12.0 and respective torchvision unless you have a very specific requirement for torch 1.10.0. Because you can still run detectron2 with torch 1.12.0 on CUDA 11.3
```
python -m pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
```
I see that you already have GCC and G++ installed on your system and they are above v5 so you need not install them again

Next clone detectron repo using

git clone https://github.com/facebookresearch/detectron2.git

Install detectron2 using
```
python -m pip install -e detectron2
```

Hi @satishjasthi, if possible I would like to keep using conda. I am running in Linux (Fedora) but using the nvidia-smi command it shows I have CUDA Version: 12.1, so I can use a higher CUDA version if the detectron installation allows it. I tried to replicate those steps using it with the following commands:

conda create -n detectron2 python=3.8.10
conda activate detectron2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

And still got the error building detectron2.

Hi @pfcouto

I see that the root cause is the mismatch between your CUDA version and torch version you are trying to install. In your case you are trying to install torch based on CUDA 11.3 for CUDA 12.1, which may not work and you might end up getting the same error. Instead if your environment supports try installing CUDA 11.7 as latest PyTorch version supports this.

You can do this using Docker. If you have Docker installed, you can use a Docker image that includes CUDA 11.7. NVIDIA provides Docker images with different CUDA versions through the NVIDIA GPU Cloud. You can pull the CUDA 11.7 image with:

   docker pull nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04

Then you can run your program inside a Docker container using this image. This has the advantage of not affecting your system's CUDA installation, but it requires you to have Docker installed and to be familiar with Docker usage.

Hi @satishjasthi, following your idea I installed the current stable version of pytorch (2.0.1)since it supports CUDA 11.7. And it worked. However, can you explain a bit more about the Docker option? Does it require NVIDIA-Docker (I have had problems before). Would I have to copy all my directory into it every time I want to run? Or copy it into and then always edit it directly in docker? Also, I am working with an external camera, would docker be able to access it?

conda create -n detectron2 python=3.8.10
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Hi @pfcouto Yes you need to use Nvidia-Docker image because it comes with a desired CUDA version which can communicate with underlying hardware. And more over it makes the whole development process simpler while dealing with projects requiring different CUDA versions. You need not copy entire project data to docker every time that would be cumbersome. Instead you can use docker mount option which will mount the desired directory on host machine to a desired directory inside docker. You can read more about mounting here. And yes using docker you can access any camera connected to the host machine

Hi @satishjasthi. I tried to install NVIDIA-Docker. However, I am facing an error.

1. Issue or feature description

Upon running the command docker run --privileged --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi i get the error

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

2. Steps to reproduce the issue

I installed nvidia through Fedora Docs, not Nvidia, so as an example nvcc --version outputs an error saying that it does not recognize nvcc command but in my host machine I can run nvidia-smi

The commands I used to install nvidia are the following:

sudo dnf install akmod-nvidia
sudo dnf install xorg-x11-drv-nvidia-cuda

And as visible in the following image I am able to run the command nvidia-smi in my host machine

I followed this guide on how yo install nvidia-docker - - and did the following:

curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
##############################
sudo dnf install nvidia-docker2
# Edit /etc/nvidia-container-runtime/config.toml and disable cgroups:
no-cgroups = true

sudo reboot
##############################
sudo systemctl start docker.service
##############################
docker run --privileged --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

and upon running this docker command I get the error show in ### 1.

The thing is, I have the file that it says it is missing (check the following image), so maybe it is looking for it in a different directory?

3. Information to attach (optional if deemed irrelevant)

uname -a:

Linux fedora 6.2.10-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr  6 23:30:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

docker version

Client: Docker Engine - Community
 Cloud integration: v1.0.31
 Version:           23.0.3
 API version:       1.41 (downgraded from 1.42)
 Go version:        go1.19.7
 Git commit:        3e7cbfd
 Built:             Tue Apr  4 22:10:33 2023
 OS/Arch:           linux/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.18.0 (104112)
 Engine:
  Version:          20.10.24
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.7
  Git commit:       5d6db84
  Built:            Tue Apr  4 18:18:42 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

rpm -qa '*nvidia*'

 nvidia-gpu-firmware-20230310-148.fc37.noarch
xorg-x11-drv-nvidia-kmodsrc-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-cuda-libs-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-libs-530.41.03-1.fc37.x86_64
nvidia-settings-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-power-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-530.41.03-1.fc37.x86_64
akmod-nvidia-530.41.03-1.fc37.x86_64
kmod-nvidia-6.2.9-200.fc37.x86_64-530.41.03-1.fc37.x86_64
nvidia-persistenced-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-cuda-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-libs-530.41.03-1.fc37.i686
xorg-x11-drv-nvidia-cuda-libs-530.41.03-1.fc37.i686
kmod-nvidia-6.2.10-200.fc37.x86_64-530.41.03-1.fc37.x86_64
nvidia-container-toolkit-base-1.13.0-1.x86_64
libnvidia-container1-1.13.0-1.x86_64
libnvidia-container-tools-1.13.0-1.x86_64
nvidia-container-toolkit-1.13.0-1.x86_64
nvidia-docker2-2.13.0-1.noarch

nvidia-container-cli -V

cli-version: 1.13.0
lib-version: 1.13.0
build date: 2023-03-31T13:12+00:00
build revision: 20823911e978a50b33823a5783f92b6e345b241a
build compiler: gcc 8.5.0 20210514 (Red Hat 8.5.0-18)
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Thanks for your help!

The error message suggests that the NVIDIA Management Library (libnvidia-ml.so.1) cannot be found. This library is part of the NVIDIA driver and is required for the NVIDIA Docker runtime to function correctly.

Here are a few things you can try to resolve this issue:

Check Your NVIDIA Driver Installation: Make sure you have the NVIDIA drivers installed correctly on your host system. You can check this by running nvidia-smi on your host system (outside of Docker). If this command fails or if it doesn't show your GPU(s), you may need to reinstall your NVIDIA drivers.
Update Your NVIDIA Docker Runtime: The NVIDIA Docker runtime has gone through several versions, and older versions may not be compatible with newer NVIDIA drivers or Docker versions. You can update the NVIDIA Docker runtime by following the instructions on the NVIDIA Docker GitHub page.
Reinstall the NVIDIA Docker Runtime: If updating doesn't solve the problem, you might try uninstalling and then reinstalling the NVIDIA Docker runtime. This can help if the runtime was installed incorrectly or if its configuration has become corrupted.
Check the Docker Command: Make sure you're using the correct Docker command to run your container. The --gpus all option is only available in Docker 19.03 and later, and it requires the NVIDIA Docker runtime to be installed as the default runtime or to be specified with the --runtime nvidia option. If you're using an older version of Docker, you might need to use nvidia-docker run instead of docker run.

Remember to restart your Docker service after making changes to the NVIDIA Docker runtime or its configuration. You can do this with sudo systemctl restart docker or sudo service docker restart, depending on your system.

FYI

I tried many methods here, and nothing works. Only one method that work for me is here.

Install pytorch and associated libraries: conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
Install PyCocoTools using conda install -c conda-forge pycocotools
Check your pytorch that install correctly or not ; python -c "import torch; print(torch.version.cuda)"
Check PyCocoTools that install correctly or not : python -c "import pycocotools; print('pycocotools is installed!')"
install detectorn2, I choose to install from a local clone. Everything's worked like charm. Install it from a local clone:
```
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
```

Cite : https://github.com/markstrefford/running-detectron2-on-windows-wsl2-rtx30xx#windows-10-wsl2-ubuntu-2004-lts

I am having the same issue in Mac. Regarding the step @satishjasthi 'Now install torch and torch vision using pip. I suggest using torch 1.12.0 and respective torchvision unless you have a very specific requirement for torch 1.10.0. Because you can still run detectron2 with torch 1.12.0 on CUDA 11.3'

I get this error: ERROR: Could not find a version that satisfies the requirement torch==1.12.0+cu113 (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0) ERROR: No matching distribution found for torch==1.12.0+cu113

I found extremely tricky to get all the dependencies right. So I am leaving here the instructions to set up the environment.

# Create conda env
conda create --name detectron2 python==3.9 -y
conda activate detectron2

# Install torch
pip install torch torchvision

# Install gcc and g++ with conda 
conda install -c conda-forge pybind11
conda install -c conda-forge gxx
conda install -c anaconda gcc_linux-64
conda upgrade -c conda-forge --all

# Install detectron2 (specific version)
pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'

I found extremely tricky to get all the dependencies right. So I am leaving here the instructions to set up the environment.

# Create conda env
conda create --name detectron2 python==3.9 -y
conda activate detectron2

# Install torch
pip install torch torchvision

# Install gcc and g++ with conda 
conda install -c conda-forge pybind11
conda install -c conda-forge gxx
conda install -c anaconda gcc_linux-64
conda upgrade -c conda-forge --all

# Install detectron2 (specific version)
pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'

I had to add a version to the gcc install, and used conda-forge: conda install -c conda-forge gcc_linux-64=13.2.0

facebookresearch / detectron2