Failure of `make ops`: cudnn.h: No such file or directory

II-Matto commented 6 years ago

Expected results

After successfully building Caffe2, I would like to compile the custom operators in Detectron. So I tried the make ops.

FYI, I have already passed the following tests:

Several basic import tests.
pytorch/build/caffe2/python/operator_test/relu_op_test.py
Detectron/detectron/tests/test_spatial_narrow_as_op.py

Actual results

It gave the following errors, reporting that the cudnn.h could not be found. Though the problem seems to be very obvious, but some general solutions just did not work. I have tried adding -I<cudnn_include_dir> in various places and setting CPPFLAGS, but none of them helps.

[ 40%] Linking CXX shared library libcaffe2_detectron_custom_ops.so
[ 40%] Built target caffe2_detectron_custom_ops
[ 60%] Building NVCC (Device) object CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/detectron/ops/caffe2_detectron_custom_ops_gpu_generated_zero_even_op.cu.o
In file included from /home/<user_name>/lib/pytorch/caffe2/include/caffe2/core/context_gpu.h:7:0,
                 from /home/<user_name>/repo/detectron/detectron/ops/zero_even_op.cu:17:
/home/<user_name>/lib/pytorch/caffe2/include/caffe2/core/common_cudnn.h:7:19: fatal error: cudnn.h: No such file or directory
 #include <cudnn.h>
                   ^
compilation terminated.
CMake Error at caffe2_detectron_custom_ops_gpu_generated_zero_even_op.cu.o.cmake:215 (message):
  Error generating
  /home/<user_name>/repo/detectron/build/CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/detectron/ops/./caffe2_detectron_custom_ops_gpu_generated_zero_even_op.cu.o

make[2]: *** [CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/detectron/ops/caffe2_detectron_custom_ops_gpu_generated_zero_even_op.cu.o] Error 1
make[1]: *** [CMakeFiles/caffe2_detectron_custom_ops_gpu.dir/all] Error 2
make: *** [all] Error 2

Detailed steps to reproduce

Instead of directly running make ops in Detectron root directory, I ran the following command so that the cuDNN path can be set manually.

mkdir -p build
cd build
Caffe2_DIR=/home/<user_name>/lib/pytorch/caffe2 cmake -DCUDNN_INCLUDE_DIR=/home/<user_name>/lib/cudnn-8.0-linux-x64-v7/include -DCUDNN_LIBRARY=/home/<user_name>/lib/cudnn-8.0-linux-x64-v7/lib64/libcudnn.so ..
make

System information

Operating system: Ubuntu 14.04
Compiler version: gcc 4.8.4
CUDA version: 8.0
cuDNN version: cudnn-8.0-linux-x64-v7
NVIDIA driver version: 384.81
GPU models (for all devices if they are not all the same): (GTX Titan)
PYTHONPATH environment variable: (pytorch/build directory)
python --version output: Python 2.7.13 :: Anaconda custom (64-bit)
Anything else that seems relevant: Latest PyTorch/Caffe2 & Detectron clone

The CMake outputs are given below.

-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Caffe2: Cannot find gflags automatically. Using legacy find.
-- Found gflags: /usr/include  
-- Caffe2: Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
-- Caffe2: Cannot find glog automatically. Using legacy find.
-- Found glog: /usr/include  
-- Caffe2: Found glog (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
CMake Warning at /home/<user_name>/lib/cmake-3.10.2-Linux-x86_64/share/cmake-3.10/Modules/FindProtobuf.cmake:455 (message):
  Protobuf compiler version 3.2.0 doesn't match library version 2.5.0
Call Stack (most recent call first):
  /home/<user_name>/lib/pytorch/caffe2/share/cmake/Caffe2/public/protobuf.cmake:6 (find_package)
  /home/<user_name>/lib/pytorch/caffe2/share/cmake/Caffe2/Caffe2Config.cmake:48 (include)
  CMakeLists.txt:8 (find_package)

-- Caffe2: Found protobuf with new-style protobuf targets.
-- Caffe2: Protobuf version 2.5.0
-- Found CUDA: /home/<user_name>/lib/cuda-8.0 (found suitable version "8.0", minimum required is "7.0") 
-- Found CUDNN: /home/<user_name>/lib/cudnn-8.0-linux-x64-v7/include  
-- Caffe2: CUDA detected: 8.0
-- Found cuDNN: v7.0.3  (include: /home/<user_name>/lib/cudnn-8.0-linux-x64-v7/include, library: /home/<user_name>/lib/cudnn-8.0-linux-x64-v7/lib64/libcudnn.so)
-- Set CUDA arch from CUDA_ARCH_NAME: Auto
-- Automatic GPU detection returned 5.2 5.2 5.2 5.2.
-- Added CUDA NVCC flags for: sm_52
-- Summary:
--   CMake version        : 3.10.2
--   CMake command        : /home/<user_name>/lib/cmake-3.10.2-Linux-x86_64/bin/cmake
--   System name          : Linux
--   C++ compiler         : /usr/bin/c++
--   C++ compiler version : 4.8.4
--   CXX flags            :  -std=c++11 -O2 -fPIC -Wno-narrowing
--   Caffe2 version       : 0.8.2
--   Caffe2 include path  : /home/<user_name>/lib/pytorch/caffe2/include
--   Caffe2 found CUDA    : TRUE
--     CUDA version       : 8.0
--     CuDNN version      : 7.0.3
-- Configuring done
CMake Warning (dev) at /home/<user_name>/lib/cmake-3.10.2-Linux-x86_64/share/cmake-3.10/Modules/FindCUDA.cmake:1801 (add_library):
  Policy CMP0028 is not set: Double colon in target name means ALIAS or
  IMPORTED target.  Run "cmake --help-policy CMP0028" for policy details.
  Use the cmake_policy command to set the policy and suppress this warning.

  Target "caffe2_detectron_custom_ops_gpu" links to target "caffe2::cudnn"
  but the target was not found.  Perhaps a find_package() call is missing for
  an IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  CMakeLists.txt:45 (CUDA_ADD_LIBRARY)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /home/<user_name>/repo/detectron/build

gadcam commented 6 years ago

To me, the interesting part is here

CMake Warning at /home//lib/cmake-3.10.2-Linux-x86_64/share/cmake-3.10/Modules/FindProtobuf.cmake:455 (message): Protobuf compiler version 3.2.0 doesn't match library version 2.5.0

Can you check if this solution is working for you ? https://github.com/caffe2/caffe2/issues/1684#issuecomment-356028126

II-Matto commented 6 years ago

@gadcam Hi, the protobuf is OK for me. The version of the protobuf used in my Anaconda and in building Caffe2 are both 3.5.0, thus being consistent. With inconsistent versions of protobuf, I will get the following error when importing caffe2_pb:

__init__() got an unexpected keyword argument 'file'

What is weird is that when I upgraded my protobuf in Anaconda using conda install <download_tar_file>, it still resulted in errors for importing caffe2_pb. The error was resolved after I tried again using pip install <download_whl_file>.

My problem in make ops is now solved by modifying the CMakeLists.txt in the Detectron root. I changed the following line:

list(APPEND CUDA_INCLUDE_DIRS -I${CAFFE2_INCLUDE_DIRS})

Remove -I.
Add ${CUDNN_INCLUDE_DIR}.

By message out the CUDA_INCLUDE_DIRS, I found that with -I one would get something like:

"<dir_1>:-I<dir_2>"

I guess this actually results in invalid include directory configuration. Since my problem is clearly caused by not finding the cuDNN header file, I further add the CUDNN_INCLUDE_DIR which is passed in with the cmake command.

With the cuDNN header file problem solved, I met another problem of not finding -lcaffe2:cudnn as described in issue https://github.com/facebookresearch/Detectron/issues/456. The solution is proposed to simply remove this option in the corresponding file, and it worked for me. Besides, I also tried changing -lcaffe2:cudnn into -lcaffe2_gpu -L<path_to_caffe2_install_dir>/lib, which also solves the problem. I am not sure if it is necessary to link the caffe2_gpu library.

Now I can successfully make ops and obtain the library files, i.e. libcaffe2_detectron_custom_ops_gpu.so and libcaffe2_detectron_custom_ops.so. I tried loading them with the following code:

from caffe2.python import dyndep, workspace

# Load only one of the following two libraries, or it may report "Offending key".
# dyndep.InitOpsLibrary('/path/to/libcaffe2_detectron_custom_ops.so')
dyndep.InitOpsLibrary('/path/to/libcaffe2_detectron_custom_ops_gpu.so')

print 'ZeroEven' in workspace.RegisteredOperators()

The output is True. This is just a simple test. Maybe more tests are needed to see if this operator can work as expected.

II-Matto commented 6 years ago

It seems that the -lcaffe2::cudnn problem is due to the Caffe2 target. The caffe2::cudnn target is defined in cmake/public/cuda.cmake, added in cmake/Dependencies.cmake, and "linked" in caffe2/CMakeLists.txt.

So I tried adding the following code (i.e. definition of caffe2::cudnn target) in Detectron CMakeLists.txt when ${CAFFE2_FOUND_CUDA} is true.

# cudnn
add_library(caffe2::cudnn UNKNOWN IMPORTED)
set_property(
    TARGET caffe2::cudnn PROPERTY IMPORTED_LOCATION
    ${CUDNN_LIBRARY})
set_property(
    TARGET caffe2::cudnn PROPERTY INTERFACE_INCLUDE_DIRECTORIES
    ${CUDNN_INCLUDE_DIR})

Then my cmake and make commands work without errors.

Caffe2_DIR=/caffe2/install/dir cmake -DCUDNN_INCLUDE_DIR=/cudnn/include/dir -DCUDNN_LIBRARY=/cudnn/lib/file.so ..
make

BTW, I also add cmake_policy(SET CMP0028 NEW) in CMakeLists.txt to suppress some warnings.

According to the comments in Detectron CMakeLists.txt. The additional -I prefix in list(APPEND CUDA_INCLUDE_DIRS -I<dir_1> -I<dir_2>) is required for CMake of versions < 3.7 (my CMake version is 3.10.2). So I guess a better solution is to check the CMake version and add the -I prefix when necessary.

Hi, @ir413. The Detectron CMakeLists.txt probably needs some fix.

ir413 commented 6 years ago

Thanks for the provided information. Good point about CMake versions. Will look into this.

facebookresearch / Detectron