facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.38k stars 7.47k forks source link

Can't install Detectron2 #5008

Open pfcouto opened 1 year ago

pfcouto commented 1 year ago

Instructions To Reproduce the 🐛 Bug:

  1. Full runnable code or full changes you made:
    conda create -n detectronTestNew python=3.8.10
    conda activate detectronTestNew
    conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
    python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
  2. Full logs or other relevant observations:

    (downloads and copies done before are ommited)
      copying detectron2/model_zoo/configs/COCO-Detection/retinanet_R_50_FPN_1x.py -> build/lib.linux-x86_64-cpython-38/detectron2/model_zoo/configs/COCO-Detection
      copying detectron2/model_zoo/configs/COCO-Detection/fcos_R_50_FPN_1x.py -> build/lib.linux-x86_64-cpython-38/detectron2/model_zoo/configs/COCO-Detection
      running build_ext
      building 'detectron2._C' extension
      creating build/temp.linux-x86_64-cpython-38
      creating build/temp.linux-x86_64-cpython-38/tmp
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/ROIAlignRotated
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/box_iou_rotated
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval
      creating build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/nms_rotated
      gcc -pthread -B /home/pedrobolsa/anaconda3/envs/detectronTestNew/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/TH -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/THC -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/include/python3.8 -c /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cpu.cpp -o build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      gcc -pthread -B /home/pedrobolsa/anaconda3/envs/detectronTestNew/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/TH -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/THC -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/include/python3.8 -c /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.cpp -o build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      gcc -pthread -B /home/pedrobolsa/anaconda3/envs/detectronTestNew/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/TH -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/THC -I/home/pedrobolsa/anaconda3/envs/detectronTestNew/include/python3.8 -c /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.cpp -o build/temp.linux-x86_64-cpython-38/tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
      cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      In file included from /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:45,
                       from /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/numpy.h:12,
                       from /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.h:4,
                       from /tmp/pip-req-build-dlzbi57q/detectron2/layers/csrc/cocoeval/cocoeval.cpp:2:
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:199:10: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
        199 |     std::uint16_t nargs;
            |          ^~~~~~~~
            |          wint_t
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:202:10: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
        202 |     std::uint16_t nargs_kw_only = 0;
            |          ^~~~~~~~
            |          wint_t
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:205:10: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
        205 |     std::uint16_t nargs_pos_only = 0;
            |          ^~~~~~~~
            |          wint_t
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h: In constructor ‘pybind11::detail::function_call::function_call(const pybind11::detail::function_record&, pybind11::handle)’:
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:310:20: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
        310 |     args.reserve(f.nargs);
            |                    ^~~~~
            |                    args
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:311:28: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
        311 |     args_convert.reserve(f.nargs);
            |                            ^~~~~
            |                            args
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h: In function ‘void pybind11::detail::process_kw_only_arg(const pybind11::arg&, function_record*)’:
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:382:10: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
        382 |     ++r->nargs_kw_only;
            |          ^~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h: In static member function ‘static void pybind11::detail::process_attribute<pybind11::pos_only>::init(const pybind11::pos_only&, pybind11::detail::function_record*)’:
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:439:12: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
        439 |         r->nargs_pos_only = static_cast<std::uint16_t>(r->args.size());
            |            ^~~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/attr.h:439:46: error: ‘uint16_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
        439 |         r->nargs_pos_only = static_cast<std::uint16_t>(r->args.size());
            |                                              ^~~~~~~~
            |                                              wint_t
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In member function ‘void pybind11::cpp_function::initialize_generic(unique_function_record&&, const char*, const std::type_info* const*, pybind11::size_t)’:
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:311:26: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
        311 |                 if (rec->nargs_kw_only > 0 && arg_index + rec->nargs_kw_only == args)
            |                          ^~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:311:64: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
        311 |                 if (rec->nargs_kw_only > 0 && arg_index + rec->nargs_kw_only == args)
            |                                                                ^~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:329:26: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
        329 |                 if (rec->nargs_pos_only > 0 && (arg_index + 1) == rec->nargs_pos_only)
            |                          ^~~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:329:72: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
        329 |                 if (rec->nargs_pos_only > 0 && (arg_index + 1) == rec->nargs_pos_only)
            |                                                                        ^~~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:371:14: error: ‘struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
        371 |         rec->nargs = (std::uint16_t) args;
            |              ^~~~~
            |              args
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:371:28: error: ‘uint16_t’ is not a member of ‘std’; did you mean ‘wint_t’?
        371 |         rec->nargs = (std::uint16_t) args;
            |                            ^~~~~~~~
            |                            wint_t
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In static member function ‘static PyObject* pybind11::cpp_function::dispatcher(PyObject*, PyObject*, PyObject*)’:
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:604:40: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
        604 |                 size_t num_args = func.nargs;    // Number of positional arguments that we need
            |                                        ^~~~~
            |                                        args
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:607:51: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_kw_only’
        607 |                 size_t pos_args = num_args - func.nargs_kw_only;
            |                                                   ^~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:657:40: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
        657 |                 if (args_copied < func.nargs_pos_only) {
            |                                        ^~~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:658:47: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
        658 |                     for (; args_copied < func.nargs_pos_only; ++args_copied) {
            |                                               ^~~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:672:44: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs_pos_only’
        672 |                     if (args_copied < func.nargs_pos_only)
            |                                            ^~~~~~~~~~~~~~
      /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:760:53: error: ‘const struct pybind11::detail::function_record’ has no member named ‘nargs’; did you mean ‘args’?
        760 |                     second_pass_convert.resize(func.nargs, false);
            |                                                     ^~~~~
            |                                                     args
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for detectron2
    Running setup.py clean for detectron2
    Failed to build detectron2
    ERROR: Could not build wheels for detectron2, which is required to install pyproject.toml-based projects

Environment:

Provide your environment information using the following command:

-------------------------------  ------------------------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.10 (default, Jun  4 2021, 15:09:15) [GCC 7.5.0]
numpy                            1.24.3
detectron2                       failed to import
detectron2._C                    not built correctly: No module named 'detectron2'
Compiler ($CXX)                  c++ (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.10.0 @/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 3080 Laptop GPU (arch=8.6)
Driver version                   530.41.03
CUDA_HOME                        None - invalid!
Pillow                           8.2.0
torchvision                      0.11.0 @/home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torchvision
torchvision arch flags           /home/pedrobolsa/anaconda3/envs/detectronTestNew/lib/python3.8/site-packages/torchvision/_C.so
cv2                              Not found
-------------------------------  ------------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

Hope someone can help me out. Thanks!

satishjasthi commented 1 year ago

Hi @pfcouto

There seems to be an issue with your PyTorch installation. You can follow these steps to install detectron2 within your environment constraints(like python 3.8.10, CUDA 11.3 and linux platform). I hope this resolves your issue, if you face any further issue even after following these steps, let me know.

pfcouto commented 1 year ago

Hi @satishjasthi, if possible I would like to keep using conda. I am running in Linux (Fedora) but using the nvidia-smi command it shows I have CUDA Version: 12.1, so I can use a higher CUDA version if the detectron installation allows it. I tried to replicate those steps using it with the following commands:

conda create -n detectron2 python=3.8.10
conda activate detectron2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

And still got the error building detectron2.

satishjasthi commented 1 year ago

Hi @pfcouto

I see that the root cause is the mismatch between your CUDA version and torch version you are trying to install. In your case you are trying to install torch based on CUDA 11.3 for CUDA 12.1, which may not work and you might end up getting the same error. Instead if your environment supports try installing CUDA 11.7 as latest PyTorch version supports this.

You can do this using Docker. If you have Docker installed, you can use a Docker image that includes CUDA 11.7. NVIDIA provides Docker images with different CUDA versions through the NVIDIA GPU Cloud. You can pull the CUDA 11.7 image with:

   docker pull nvcr.io/nvidia/cuda:11.7.0-base-ubuntu20.04

Then you can run your program inside a Docker container using this image. This has the advantage of not affecting your system's CUDA installation, but it requires you to have Docker installed and to be familiar with Docker usage.

pfcouto commented 1 year ago

Hi @satishjasthi, following your idea I installed the current stable version of pytorch (2.0.1)since it supports CUDA 11.7. And it worked. However, can you explain a bit more about the Docker option? Does it require NVIDIA-Docker (I have had problems before). Would I have to copy all my directory into it every time I want to run? Or copy it into and then always edit it directly in docker? Also, I am working with an external camera, would docker be able to access it?

conda create -n detectron2 python=3.8.10
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
satishjasthi commented 1 year ago

Hi @pfcouto Yes you need to use Nvidia-Docker image because it comes with a desired CUDA version which can communicate with underlying hardware. And more over it makes the whole development process simpler while dealing with projects requiring different CUDA versions. You need not copy entire project data to docker every time that would be cumbersome. Instead you can use docker mount option which will mount the desired directory on host machine to a desired directory inside docker. You can read more about mounting here. And yes using docker you can access any camera connected to the host machine

pfcouto commented 1 year ago

Hi @satishjasthi. I tried to install NVIDIA-Docker. However, I am facing an error.

1. Issue or feature description

Upon running the command docker run --privileged --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi i get the error

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

2. Steps to reproduce the issue

I installed nvidia through Fedora Docs, not Nvidia, so as an example nvcc --version outputs an error saying that it does not recognize nvcc command but in my host machine I can run nvidia-smi

The commands I used to install nvidia are the following:

sudo dnf install akmod-nvidia
sudo dnf install xorg-x11-drv-nvidia-cuda

And as visible in the following image I am able to run the command nvidia-smi in my host machine

image

I followed this guide on how yo install nvidia-docker - - and did the following:

curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
##############################
sudo dnf install nvidia-docker2
# Edit /etc/nvidia-container-runtime/config.toml and disable cgroups:
no-cgroups = true

sudo reboot
##############################
sudo systemctl start docker.service
##############################
docker run --privileged --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

and upon running this docker command I get the error show in ### 1.

The thing is, I have the file that it says it is missing (check the following image), so maybe it is looking for it in a different directory?

image

3. Information to attach (optional if deemed irrelevant)

uname -a:

Linux fedora 6.2.10-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr  6 23:30:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

docker version

Client: Docker Engine - Community
 Cloud integration: v1.0.31
 Version:           23.0.3
 API version:       1.41 (downgraded from 1.42)
 Go version:        go1.19.7
 Git commit:        3e7cbfd
 Built:             Tue Apr  4 22:10:33 2023
 OS/Arch:           linux/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.18.0 (104112)
 Engine:
  Version:          20.10.24
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.7
  Git commit:       5d6db84
  Built:            Tue Apr  4 18:18:42 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

rpm -qa '*nvidia*'

 nvidia-gpu-firmware-20230310-148.fc37.noarch
xorg-x11-drv-nvidia-kmodsrc-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-cuda-libs-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-libs-530.41.03-1.fc37.x86_64
nvidia-settings-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-power-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-530.41.03-1.fc37.x86_64
akmod-nvidia-530.41.03-1.fc37.x86_64
kmod-nvidia-6.2.9-200.fc37.x86_64-530.41.03-1.fc37.x86_64
nvidia-persistenced-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-cuda-530.41.03-1.fc37.x86_64
xorg-x11-drv-nvidia-libs-530.41.03-1.fc37.i686
xorg-x11-drv-nvidia-cuda-libs-530.41.03-1.fc37.i686
kmod-nvidia-6.2.10-200.fc37.x86_64-530.41.03-1.fc37.x86_64
nvidia-container-toolkit-base-1.13.0-1.x86_64
libnvidia-container1-1.13.0-1.x86_64
libnvidia-container-tools-1.13.0-1.x86_64
nvidia-container-toolkit-1.13.0-1.x86_64
nvidia-docker2-2.13.0-1.noarch

nvidia-container-cli -V

cli-version: 1.13.0
lib-version: 1.13.0
build date: 2023-03-31T13:12+00:00
build revision: 20823911e978a50b33823a5783f92b6e345b241a
build compiler: gcc 8.5.0 20210514 (Red Hat 8.5.0-18)
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Thanks for your help!

satishjasthi commented 1 year ago

The error message suggests that the NVIDIA Management Library (libnvidia-ml.so.1) cannot be found. This library is part of the NVIDIA driver and is required for the NVIDIA Docker runtime to function correctly.

Here are a few things you can try to resolve this issue:

  1. Check Your NVIDIA Driver Installation: Make sure you have the NVIDIA drivers installed correctly on your host system. You can check this by running nvidia-smi on your host system (outside of Docker). If this command fails or if it doesn't show your GPU(s), you may need to reinstall your NVIDIA drivers.

  2. Update Your NVIDIA Docker Runtime: The NVIDIA Docker runtime has gone through several versions, and older versions may not be compatible with newer NVIDIA drivers or Docker versions. You can update the NVIDIA Docker runtime by following the instructions on the NVIDIA Docker GitHub page.

  3. Reinstall the NVIDIA Docker Runtime: If updating doesn't solve the problem, you might try uninstalling and then reinstalling the NVIDIA Docker runtime. This can help if the runtime was installed incorrectly or if its configuration has become corrupted.

  4. Check the Docker Command: Make sure you're using the correct Docker command to run your container. The --gpus all option is only available in Docker 19.03 and later, and it requires the NVIDIA Docker runtime to be installed as the default runtime or to be specified with the --runtime nvidia option. If you're using an older version of Docker, you might need to use nvidia-docker run instead of docker run.

Remember to restart your Docker service after making changes to the NVIDIA Docker runtime or its configuration. You can do this with sudo systemctl restart docker or sudo service docker restart, depending on your system.

bring-nirachornkul commented 1 year ago

FYI

I tried many methods here, and nothing works. Only one method that work for me is here.

  1. Install pytorch and associated libraries: conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
  2. Install PyCocoTools using conda install -c conda-forge pycocotools
  3. Check your pytorch that install correctly or not ; python -c "import torch; print(torch.version.cuda)"
  4. Check PyCocoTools that install correctly or not : python -c "import pycocotools; print('pycocotools is installed!')"
  5. install detectorn2, I choose to install from a local clone. Everything's worked like charm. Install it from a local clone:
    git clone https://github.com/facebookresearch/detectron2.git
    python -m pip install -e detectron2

Cite : https://github.com/markstrefford/running-detectron2-on-windows-wsl2-rtx30xx#windows-10-wsl2-ubuntu-2004-lts

valentinamisi commented 8 months ago

I am having the same issue in Mac. Regarding the step @satishjasthi 'Now install torch and torch vision using pip. I suggest using torch 1.12.0 and respective torchvision unless you have a very specific requirement for torch 1.10.0. Because you can still run detectron2 with torch 1.12.0 on CUDA 11.3'

I get this error: ERROR: Could not find a version that satisfies the requirement torch==1.12.0+cu113 (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0) ERROR: No matching distribution found for torch==1.12.0+cu113

miquel-espinosa commented 6 months ago

I found extremely tricky to get all the dependencies right. So I am leaving here the instructions to set up the environment.

# Create conda env
conda create --name detectron2 python==3.9 -y
conda activate detectron2

# Install torch
pip install torch torchvision

# Install gcc and g++ with conda 
conda install -c conda-forge pybind11
conda install -c conda-forge gxx
conda install -c anaconda gcc_linux-64
conda upgrade -c conda-forge --all

# Install detectron2 (specific version)
pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'
cwood1967 commented 1 month ago

I found extremely tricky to get all the dependencies right. So I am leaving here the instructions to set up the environment.

# Create conda env
conda create --name detectron2 python==3.9 -y
conda activate detectron2

# Install torch
pip install torch torchvision

# Install gcc and g++ with conda 
conda install -c conda-forge pybind11
conda install -c conda-forge gxx
conda install -c anaconda gcc_linux-64
conda upgrade -c conda-forge --all

# Install detectron2 (specific version)
pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'

I had to add a version to the gcc install, and used conda-forge: conda install -c conda-forge gcc_linux-64=13.2.0