facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.94k forks source link

CUDA error: no kernel image is available for execution on the device Error from operator: type: "SpatialNarrowAsGradient" #2196

Open FduJyy opened 6 years ago

FduJyy commented 6 years ago

If this is a build issue, please fill out the template below.

System information

I installed Caffe2 via pre-built binaries using conda install -c caffe2 caffe2-cuda9.0-cudnn7 and came across a problem. It seems that a file called "libnccl.so.2" is missing. I cloned the nccl library and compiled it but didn't find any file called "libnccl.so.2". This problem is still unsolved.

Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from caffe2.python import workspace
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: libnccl.so.2: cannot open shared object file: No such file or directory
pjh5 commented 6 years ago

Which NCCL library did you clone? This is the script we use to install the NCCL that we build against https://github.com/caffe2/caffe2/blob/master/docker/jenkins/common/install_nccl.sh . This should be the library that you need http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1404_4.0-2_amd64.deb . If you call that script with UBUNTU_VERSION=16.04 and CUDA_VERSION=9.0 then it should install correctly

FduJyy commented 6 years ago

@pjh5 Thanks for your help! Now I can run from caffe2.python import workspace without errors. Next I tried to use the Detectron platform. However when I finished installing dependencies and ran the SpatialNarrowAsOp test, I met another problem Encountered CUDA error: no kernel image is available for execution on the device Error from operator: input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true. Wish to know what caused that problem?

(caffe) jyy@jyy-OptiPlex-9020:~/Detectron$ python ./tests/test_spatial_narrow_as_op.py
E0309 14:17:00.375676  3086 init_intrinsics_check.cc:59] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0309 14:17:00.375697  3086 init_intrinsics_check.cc:59] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0309 14:17:00.375700  3086 init_intrinsics_check.cc:59] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /home/jyy/anaconda3/envs/caffe/lib/libcaffe2_detectron_ops_gpu.so
F.E
======================================================================
ERROR: test_small_forward_and_gradient (__main__.SpatialNarrowAsOpTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests/test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient
    self._run_test(A, B, check_grad=True)
  File "./tests/test_spatial_narrow_as_op.py", line 49, in _run_test
    res, grad, grad_estimated = gc.CheckSimple(op, [A, B], 0, [0])
  File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/gradient_checker.py", line 284, in CheckSimple
    outputs_with_grads
  File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/gradient_checker.py", line 201, in GetLossAndGrad
    workspace.RunOperatorsOnce(grad_ops)
  File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/workspace.py", line 184, in RunOperatorsOnce
    success = RunOperatorOnce(op)
  File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/workspace.py", line 179, in RunOperatorOnce
    return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator: 
input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

======================================================================
FAIL: test_large_forward (__main__.SpatialNarrowAsOpTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward
    self._run_test(A, B)
  File "./tests/test_spatial_narrow_as_op.py", line 54, in _run_test
    np.testing.assert_allclose(C, C_ref, rtol=1e-5, atol=1e-08)
  File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 1396, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-08

(mismatch 100.0%)
 x: array([[[[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],...
 y: array([[[[ 1.707480e+00,  1.710607e+00,  1.279160e+00, ...,
          -9.014695e-01, -1.781531e+00,  4.036736e-01],
         [ 1.895508e+00, -3.324545e-01,  3.578335e-01, ...,...

----------------------------------------------------------------------
Ran 3 tests in 0.557s

FAILED (failures=1, errors=1)
pjh5 commented 6 years ago

@FduJyy can you try running this on CUDA 8?

@orionr should this work in CUDA 9 right now?

NovenBae commented 6 years ago

I also encountered this problem, but when I compile caffe2 from source, there is no problem.

====================================================================== ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

Traceback (most recent call last): File "./tests/test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient self._run_test(A, B, check_grad=True) File "./tests/test_spatial_narrow_as_op.py", line 49, in _run_test res, grad, grad_estimated = gc.CheckSimple(op, [A, B], 0, [0]) File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/gradient_checker.py", line 284, in CheckSimple outputs_with_grads File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/gradient_checker.py", line 201, in GetLossAndGrad workspace.RunOperatorsOnce(grad_ops) File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/workspace.py", line 184, in RunOperatorsOnce success = RunOperatorOnce(op) File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/workspace.py", line 179, in RunOperatorOnce return C.run_operator_once(StringifyProto(operator)) RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator: input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

====================================================================== FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

Traceback (most recent call last): File "./tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward self._run_test(A, B) File "./tests/test_spatial_narrow_as_op.py", line 54, in _run_test np.testing.assert_allclose(C, C_ref, rtol=1e-5, atol=1e-08) File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 1396, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=1e-05, atol=1e-08

pjh5 commented 6 years ago

What does your CUDA installation look like? Can you ls -lah the folder where CUDA is installed? You can probably find it with find / -name libcuda*

FduJyy commented 6 years ago

@pjh5 You might mean my CUDA installation? It shows here.

jyy@jyy:/usr/local/cuda-9.0$ ls -lah
drwxr-xr-x 18 root root 4.0K 3月   8 22:05 .
drwxr-xr-x 13 root root 4.0K 3月   8 22:02 ..
drwxr-xr-x  3 root root 4.0K 3月   8 22:02 bin
drwxr-xr-x  5 root root 4.0K 3月   8 22:02 doc
drwxr-xr-x  5 root root 4.0K 3月   8 22:02 extras
drwxr-xr-x  5 root root 4.0K 3月   9 21:00 include
drwxr-xr-x  5 root root 4.0K 3月   8 22:02 jre
drwxr-xr-x  3 root root 4.0K 3月   9 22:59 lib64
drwxr-xr-x  8 root root 4.0K 3月   8 22:02 libnsight
drwxr-xr-x  7 root root 4.0K 3月   8 22:02 libnvvp
drwxr-xr-x  2 root root 4.0K 3月   8 22:02 nsightee_plugins
-r--r--r--  1 root root  39K 3月   8 22:59 NVIDIA_SLA_cuDNN_Support.txt
drwxr-xr-x  3 root root 4.0K 3月   8 22:02 nvml
drwxr-xr-x  7 root root 4.0K 3月   8 22:02 nvvm
drwxr-xr-x  2 root root 4.0K 3月   8 22:02 pkgconfig
drwxr-xr-x 11 root root 4.0K 3月   8 22:02 samples
drwxr-xr-x  3 root root 4.0K 3月   8 22:02 share
drwxr-xr-x  2 root root 4.0K 3月   8 22:02 src
drwxr-xr-x  2 root root 4.0K 3月   8 22:02 tools
-rw-r--r--  1 root root   21 3月   8 22:02 version.txt
jyy@jyy:/usr/local/cuda-9.0$ ls -lah lib64
drwxr-xr-x  3 root root  4.0K 3月   9 22:59 .
drwxr-xr-x 18 root root  4.0K 3月   8 22:05 ..
lrwxrwxrwx  1 root root    18 3月   8 22:02 libaccinj64.so -> libaccinj64.so.9.0
lrwxrwxrwx  1 root root    22 3月   8 22:02 libaccinj64.so.9.0 -> libaccinj64.so.9.0.176
-rwxr-xr-x  1 root root  6.6M 3月   8 22:02 libaccinj64.so.9.0.176
-rw-r--r--  1 root root   67M 3月   8 22:02 libcublas_device.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libcublas.so -> libcublas.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libcublas.so.9.0 -> libcublas.so.9.0.176
-rwxr-xr-x  1 root root   51M 3月   8 22:02 libcublas.so.9.0.176
-rw-r--r--  1 root root   57M 3月   8 22:02 libcublas_static.a
-rw-r--r--  1 root root  624K 3月   8 22:02 libcudadevrt.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libcudart.so -> libcudart.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libcudart.so.9.0 -> libcudart.so.9.0.176
-rwxr-xr-x  1 root root  433K 3月   8 22:02 libcudart.so.9.0.176
-rw-r--r--  1 root root  812K 3月   8 22:02 libcudart_static.a
-rwxr-xr-x  1 root root  306M 3月   9 22:59 libcudnn.so
-rwxr-xr-x  1 root root  306M 3月   9 22:59 libcudnn.so.7
-rwxr-xr-x  1 root root  275M 3月   9 21:00 libcudnn.so.7.0.5
-rwxr-xr-x  1 root root  306M 3月   9 22:59 libcudnn.so.7.1.1
-rw-r--r--  1 root root  302M 3月   9 23:00 libcudnn_static.a
lrwxrwxrwx  1 root root    15 3月   8 22:02 libcufft.so -> libcufft.so.9.0
lrwxrwxrwx  1 root root    19 3月   8 22:02 libcufft.so.9.0 -> libcufft.so.9.0.176
-rwxr-xr-x  1 root root  127M 3月   8 22:02 libcufft.so.9.0.176
-rw-r--r--  1 root root  131M 3月   8 22:02 libcufft_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libcufftw.so -> libcufftw.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libcufftw.so.9.0 -> libcufftw.so.9.0.176
-rwxr-xr-x  1 root root  496K 3月   8 22:02 libcufftw.so.9.0.176
-rw-r--r--  1 root root   41K 3月   8 22:02 libcufftw_static.a
lrwxrwxrwx  1 root root    17 3月   8 22:02 libcuinj64.so -> libcuinj64.so.9.0
lrwxrwxrwx  1 root root    21 3月   8 22:02 libcuinj64.so.9.0 -> libcuinj64.so.9.0.176
-rwxr-xr-x  1 root root  6.9M 3月   8 22:02 libcuinj64.so.9.0.176
-rw-r--r--  1 root root  1.6M 3月   8 22:02 libculibos.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libcurand.so -> libcurand.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libcurand.so.9.0 -> libcurand.so.9.0.176
-rwxr-xr-x  1 root root   57M 3月   8 22:02 libcurand.so.9.0.176
-rw-r--r--  1 root root   57M 3月   8 22:02 libcurand_static.a
lrwxrwxrwx  1 root root    18 3月   8 22:02 libcusolver.so -> libcusolver.so.9.0
lrwxrwxrwx  1 root root    22 3月   8 22:02 libcusolver.so.9.0 -> libcusolver.so.9.0.176
-rwxr-xr-x  1 root root   74M 3月   8 22:02 libcusolver.so.9.0.176
-rw-r--r--  1 root root   34M 3月   8 22:02 libcusolver_static.a
lrwxrwxrwx  1 root root    18 3月   8 22:02 libcusparse.so -> libcusparse.so.9.0
lrwxrwxrwx  1 root root    22 3月   8 22:02 libcusparse.so.9.0 -> libcusparse.so.9.0.176
-rwxr-xr-x  1 root root   54M 3月   8 22:02 libcusparse.so.9.0.176
-rw-r--r--  1 root root   62M 3月   8 22:02 libcusparse_static.a
lrwxrwxrwx  1 root root    14 3月   8 22:02 libnppc.so -> libnppc.so.9.0
lrwxrwxrwx  1 root root    18 3月   8 22:02 libnppc.so.9.0 -> libnppc.so.9.0.176
-rwxr-xr-x  1 root root  478K 3月   8 22:02 libnppc.so.9.0.176
-rw-r--r--  1 root root   24K 3月   8 22:02 libnppc_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libnppial.so -> libnppial.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libnppial.so.9.0 -> libnppial.so.9.0.176
-rwxr-xr-x  1 root root   11M 3月   8 22:02 libnppial.so.9.0.176
-rw-r--r--  1 root root   16M 3月   8 22:02 libnppial_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libnppicc.so -> libnppicc.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libnppicc.so.9.0 -> libnppicc.so.9.0.176
-rwxr-xr-x  1 root root  4.1M 3月   8 22:02 libnppicc.so.9.0.176
-rw-r--r--  1 root root  4.8M 3月   8 22:02 libnppicc_static.a
lrwxrwxrwx  1 root root    17 3月   8 22:02 libnppicom.so -> libnppicom.so.9.0
lrwxrwxrwx  1 root root    21 3月   8 22:02 libnppicom.so.9.0 -> libnppicom.so.9.0.176
-rwxr-xr-x  1 root root  1.3M 3月   8 22:02 libnppicom.so.9.0.176
-rw-r--r--  1 root root 1011K 3月   8 22:02 libnppicom_static.a
lrwxrwxrwx  1 root root    17 3月   8 22:02 libnppidei.so -> libnppidei.so.9.0
lrwxrwxrwx  1 root root    21 3月   8 22:02 libnppidei.so.9.0 -> libnppidei.so.9.0.176
-rwxr-xr-x  1 root root  7.5M 3月   8 22:02 libnppidei.so.9.0.176
-rw-r--r--  1 root root   11M 3月   8 22:02 libnppidei_static.a
lrwxrwxrwx  1 root root    15 3月   8 22:02 libnppif.so -> libnppif.so.9.0
lrwxrwxrwx  1 root root    19 3月   8 22:02 libnppif.so.9.0 -> libnppif.so.9.0.176
-rwxr-xr-x  1 root root   55M 3月   8 22:02 libnppif.so.9.0.176
-rw-r--r--  1 root root   60M 3月   8 22:02 libnppif_static.a
lrwxrwxrwx  1 root root    15 3月   8 22:02 libnppig.so -> libnppig.so.9.0
lrwxrwxrwx  1 root root    19 3月   8 22:02 libnppig.so.9.0 -> libnppig.so.9.0.176
-rwxr-xr-x  1 root root   27M 3月   8 22:02 libnppig.so.9.0.176
-rw-r--r--  1 root root   30M 3月   8 22:02 libnppig_static.a
lrwxrwxrwx  1 root root    15 3月   8 22:02 libnppim.so -> libnppim.so.9.0
lrwxrwxrwx  1 root root    19 3月   8 22:02 libnppim.so.9.0 -> libnppim.so.9.0.176
-rwxr-xr-x  1 root root  4.9M 3月   8 22:02 libnppim.so.9.0.176
-rw-r--r--  1 root root  4.9M 3月   8 22:02 libnppim_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libnppist.so -> libnppist.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libnppist.so.9.0 -> libnppist.so.9.0.176
-rwxr-xr-x  1 root root   15M 3月   8 22:02 libnppist.so.9.0.176
-rw-r--r--  1 root root   20M 3月   8 22:02 libnppist_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libnppisu.so -> libnppisu.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libnppisu.so.9.0 -> libnppisu.so.9.0.176
-rwxr-xr-x  1 root root  467K 3月   8 22:02 libnppisu.so.9.0.176
-rw-r--r--  1 root root   11K 3月   8 22:02 libnppisu_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libnppitc.so -> libnppitc.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libnppitc.so.9.0 -> libnppitc.so.9.0.176
-rwxr-xr-x  1 root root  2.9M 3月   8 22:02 libnppitc.so.9.0.176
-rw-r--r--  1 root root  3.9M 3月   8 22:02 libnppitc_static.a
lrwxrwxrwx  1 root root    14 3月   8 22:02 libnpps.so -> libnpps.so.9.0
lrwxrwxrwx  1 root root    18 3月   8 22:02 libnpps.so.9.0 -> libnpps.so.9.0.176
-rwxr-xr-x  1 root root  8.9M 3月   8 22:02 libnpps.so.9.0.176
-rw-r--r--  1 root root   12M 3月   8 22:02 libnpps_static.a
lrwxrwxrwx  1 root root    16 3月   8 22:02 libnvblas.so -> libnvblas.so.9.0
lrwxrwxrwx  1 root root    20 3月   8 22:02 libnvblas.so.9.0 -> libnvblas.so.9.0.176
-rwxr-xr-x  1 root root  519K 3月   8 22:02 libnvblas.so.9.0.176
lrwxrwxrwx  1 root root    17 3月   8 22:02 libnvgraph.so -> libnvgraph.so.9.0
lrwxrwxrwx  1 root root    21 3月   8 22:02 libnvgraph.so.9.0 -> libnvgraph.so.9.0.176
-rwxr-xr-x  1 root root   23M 3月   8 22:02 libnvgraph.so.9.0.176
-rw-r--r--  1 root root   53M 3月   8 22:02 libnvgraph_static.a
lrwxrwxrwx  1 root root    24 3月   8 22:02 libnvrtc-builtins.so -> libnvrtc-builtins.so.9.0
lrwxrwxrwx  1 root root    28 3月   8 22:02 libnvrtc-builtins.so.9.0 -> libnvrtc-builtins.so.9.0.176
-rwxr-xr-x  1 root root  3.2M 3月   8 22:02 libnvrtc-builtins.so.9.0.176
lrwxrwxrwx  1 root root    15 3月   8 22:02 libnvrtc.so -> libnvrtc.so.9.0
lrwxrwxrwx  1 root root    19 3月   8 22:02 libnvrtc.so.9.0 -> libnvrtc.so.9.0.176
-rwxr-xr-x  1 root root   22M 3月   8 22:02 libnvrtc.so.9.0.176
lrwxrwxrwx  1 root root    18 3月   8 22:02 libnvToolsExt.so -> libnvToolsExt.so.1
lrwxrwxrwx  1 root root    22 3月   8 22:02 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0
-rwxr-xr-x  1 root root   37K 3月   8 22:02 libnvToolsExt.so.1.0.0
lrwxrwxrwx  1 root root    14 3月   8 22:02 libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx  1 root root    16 3月   8 22:02 libOpenCL.so.1 -> libOpenCL.so.1.0
lrwxrwxrwx  1 root root    18 3月   8 22:02 libOpenCL.so.1.0 -> libOpenCL.so.1.0.0
-rw-r--r--  1 root root   26K 3月   8 22:02 libOpenCL.so.1.0.0
drwxr-xr-x  2 root root  4.0K 3月   8 22:02 stubs
pjh5 commented 6 years ago

Do these run if you use

CAFFE2_PYPATH=/home/jyy/anaconda3/envs/caffe/lib/python2.7/site-packages/caffe2/python/ python \
  -m pytest \
  -x \
  -v \
  -s \
  --ignore "$CAFFE2_PYPATH/test/executor_test.py" \
  --ignore "$CAFFE2_PYPATH/operator_test/matmul_op_test.py" \
  --ignore "$CAFFE2_PYPATH/operator_test/pack_ops_test.py" \
  --ignore "$CAFFE2_PYPATH/mkl/mkl_sbn_speed_test.py" \
  "$CAFFE2_PYPATH"
joaofayad commented 6 years ago

I have the same problem as @FduJyy on Detectron.

When I run your suggested tests I get this:

============================= test session starts ============================= platform linux2 -- Python 2.7.14, pytest-3.5.0, py-1.5.3, pluggy-0.6.0 -- /home/joaofayad/anaconda3/envs/detectron/bin/python cachedir: .pytest_cache rootdir: /home/joaofayad/detectron, inifile: collected 34 items

lib/core/test_engine.py::test_net_on_dataset ERROR [ 2%]

=================================== ERRORS ==================================== ____ ERROR at setup of test_net_on_dataset ____ file /home/joaofayad/detectron/lib/core/test_engine.py, line 126 def test_net_on_dataset( E fixture 'weights_file' not found > available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, monkeypatch, pytestconfig, record_property, record_xml_attribute, record_xml_property, recwarn, tmpdir, tmpdir_factory > use 'pytest --fixtures [testpath]' for help on them.

FduJyy commented 6 years ago

@pjh5 @joaofayad Sorry for replying late. Thanks to @NovenBae's advice, I solved this problem by compiling caffe2 from source using conda build (referring to the official website). Now everything is OK and Detectron runs well.

BanuSelinTosun commented 6 years ago

@pjh5 @joaofayad @FduJyy I am having the same problem. I have installed my caffe2 with the pre-bild binaries code. Everything is working and the GPU test is returning 1. And when I run the detectron installation test, I am encountering this same FAILED (failures=1, errors=1) error. I have used CUDA8 and cuDNN7 for the installation. When I run pjh5's test, they failed with ImportError. I am using an Azure DSVM, and the X2Go interface is not letting me copy-paste. So I took a snippet of the screen: image Now, I am going to try reinstalling the caffe2 from the main website (build from source). @FduJyy was this what you meant? Without using the conda install -c caffe2 caffe2-cuda8.0-cudnn7 and instead using a list of pip install commands? Thank you!

zwangab91 commented 6 years ago

@FduJyy I met the exact same problem and was able to solve it with your method. Thanks!