AssertionError: cuda is not available after installation

tengerye commented 4 years ago

What did I do?

I followed the instruction to install from a docker container. The process completes well, but when I run a definitely code, it raises the error AssertionError: cuda is not available. Please check your installation..

What command did I run? python tools/train_net.py --config-file configs/FCOS-Detection/Base-FCOS.yaml --num-gpus 2

What I observed? The logs are as follows:


Command Line Args: Namespace(config_file='configs/FCOS-Detection/Base-FCOS.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
Traceback (most recent call last):
File "tools/train_net.py", line 235, in <module>
args=(args,),
File "/opt/tiger/conda/lib/python3.7/site-packages/detectron2/engine/launch.py", line 54, in launch
daemon=False,
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/opt/tiger/conda/lib/python3.7/site-packages/detectron2/engine/launch.py", line 63, in _distributed_worker assert torch.cuda.is_available(), "cuda is not available. Please check your installation." AssertionError: cuda is not available. Please check your installation.



4. The code I run is from
[AdelaiDet](https://github.com/aim-uofa/AdelaiDet).

## Expected behavior:

Running without error. 

The CUDA is definitely there. When I executed `nvcc --version`, I got `nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89`.

## Environment:

No CUDA runtime is found, using CUDA_HOME='/opt/tiger/cuda'
------------------------  -------------------------------------------------------------------------
sys.platform              linux
Python                    3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
numpy                     1.18.1
detectron2                0.1.2 @/opt/tiger/conda/lib/python3.7/site-packages/detectron2
detectron2 compiler       GCC 8.3
detectron2 CUDA compiler  not available
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.5.0 @/opt/tiger/conda/lib/python3.7/site-packages/torch
PyTorch debug build       False
CUDA available            False
Pillow                    7.0.0
torchvision               0.6.0a0+82fd1c8 @/opt/tiger/conda/lib/python3.7/site-packages/torchvision
fvcore                    0.1
------------------------  -------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

ppwwyyxx commented 4 years ago

The docker file is meant to use like this.

tengerye commented 4 years ago

The problem that I can't use your Dockerfile directly is that, I have to use some private Docker image. So I have to install it through docker container and export it as image.

ppwwyyxx commented 4 years ago

Thanks for clarifying. I thought you were using the dockerfile since you mention docker container.

You need to install pytorch and other dependencies correctly so that torch.cuda.is_available() returns True. Since this is a pytorch function, it has nothing to do with detectron2.

byronyi commented 4 years ago

Fixed internally. Kinda embarrassing...

aarbelle commented 4 years ago

@byronyi Can you say what you did to fix it, I have the same issue.

facebookresearch / detectron2

AssertionError: cuda is not available after installation #1391