facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.94k forks source link

Debug message about libcurand.so.9.0 #2212

Closed xiashh closed 6 years ago

xiashh commented 6 years ago

If this is a build issue, please fill out the template below.

System information

**** Summary **** After I installed my caffe2 with GPU support using pre-built binaries, I want to test if it works. Then I type the following command: python -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices()) then I have something wrong: WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: libcurand.so.9.0: cannot open shared object file: No such file or directory 0

pjh5 commented 6 years ago

Can you post your cmake summary output?

There should be a libcaffe2.so somewhere on your machine. Can you run ldd on it?

Please also give the output of find /usr -name libcurand.so and run ls on the directory that it's in.

akshay-raj-dhamija commented 6 years ago

Hi, I am facing the same problem.

adhamija@r2-d8:~$ find /usr -name libcurand.so
/usr/local/cuda-9.1/lib64/stubs/libcurand.so
/usr/local/cuda-9.1/lib64/libcurand.so
adhamija@r2-d8:~$ ls /usr/local/cuda-9.1/lib64/stubs/
libcublas.so  libcufft.so   libcurand.so    libcusparse.so  libnppial.so  libnppicom.so  libnppif.so  libnppim.so   libnppisu.so  libnpps.so     libnvidia-ml.so
libcuda.so    libcufftw.so  libcusolver.so  libnppc.so      libnppicc.so  libnppidei.so  libnppig.so  libnppist.so  libnppitc.so  libnvgraph.so  libnvrtc.so
adhamija@r2-d8:~$ ls /usr/local/cuda-9.1/lib64/
libaccinj64.so         libcufft_static.a      libcusolver_static.a   libnppicc_static.a    libnppig_static.a    libnppitc_static.a           libnvrtc.so.9.1
libaccinj64.so.9.1     libcufftw.so           libcusparse.so         libnppicom.so         libnppim.so          libnpps.so                   libnvrtc.so.9.1.85
libaccinj64.so.9.1.85  libcufftw.so.9.1       libcusparse.so.9.1     libnppicom.so.9.1     libnppim.so.9.1      libnpps.so.9.1               libnvToolsExt.so
libcublas_device.a     libcufftw.so.9.1.85    libcusparse.so.9.1.85  libnppicom.so.9.1.85  libnppim.so.9.1.85   libnpps.so.9.1.85            libnvToolsExt.so.1
libcublas.so           libcufftw_static.a     libcusparse_static.a   libnppicom_static.a   libnppim_static.a    libnpps_static.a             libnvToolsExt.so.1.0.0
libcublas.so.9.1       libcuinj64.so          libnppc.so             libnppidei.so         libnppist.so         libnvblas.so                 libOpenCL.so
libcublas.so.9.1.85    libcuinj64.so.9.1      libnppc.so.9.1         libnppidei.so.9.1     libnppist.so.9.1     libnvblas.so.9.1             libOpenCL.so.1
libcublas_static.a     libcuinj64.so.9.1.85   libnppc.so.9.1.85      libnppidei.so.9.1.85  libnppist.so.9.1.85  libnvblas.so.9.1.85          libOpenCL.so.1.0
libcudadevrt.a         libculibos.a           libnppc_static.a       libnppidei_static.a   libnppist_static.a   libnvgraph.so                libOpenCL.so.1.0.0
libcudart.so           libcurand.so           libnppial.so           libnppif.so           libnppisu.so         libnvgraph.so.9.1            stubs
libcudart.so.9.1       libcurand.so.9.1       libnppial.so.9.1       libnppif.so.9.1       libnppisu.so.9.1     libnvgraph.so.9.1.85
libcudart.so.9.1.85    libcurand.so.9.1.85    libnppial.so.9.1.85    libnppif.so.9.1.85    libnppisu.so.9.1.85  libnvgraph_static.a
libcudart_static.a     libcurand_static.a     libnppial_static.a     libnppif_static.a     libnppisu_static.a   libnvrtc-builtins.so
libcufft.so            libcusolver.so         libnppicc.so           libnppig.so           libnppitc.so         libnvrtc-builtins.so.9.1
libcufft.so.9.1        libcusolver.so.9.1     libnppicc.so.9.1       libnppig.so.9.1       libnppitc.so.9.1     libnvrtc-builtins.so.9.1.85
libcufft.so.9.1.85     libcusolver.so.9.1.85  libnppicc.so.9.1.85    libnppig.so.9.1.85    libnppitc.so.9.1.85  libnvrtc.so
adhamija@r2-d8:~$ ldd /usr/local/cuda-9.1/lib64/libcurand.so
    linux-vdso.so.1 =>  (0x00007ffe9c1f6000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa4ed806000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa4ed5e9000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa4ed3e5000)
    libstdc++.so.6 => /home/adhamija/anaconda2/lib/libstdc++.so.6 (0x00007fa4ed0ab000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa4ecda2000)
    libgcc_s.so.1 => /home/adhamija/anaconda2/lib/libgcc_s.so.1 (0x00007fa4ecb90000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa4ec7c6000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa4f1991000)
adhamija@r2-d8:~$ ldd /net/home/store/home/adhamija/anaconda2/lib/libcaffe2.so
    linux-vdso.so.1 =>  (0x00007ffc14925000)
    libprotobuf.so.14 => /net/home/store/home/adhamija/anaconda2/lib/./libprotobuf.so.14 (0x00007fcb0a67a000)
    libgflags.so.2.2 => /net/home/store/home/adhamija/anaconda2/lib/./libgflags.so.2.2 (0x00007fcb0a455000)
    libglog.so.0 => /net/home/store/home/adhamija/anaconda2/lib/./libglog.so.0 (0x00007fcb0a224000)
    liblmdb.so => /net/home/store/home/adhamija/anaconda2/lib/./liblmdb.so (0x00007fcb0a00f000)
    libgcc_s.so.1 => /net/home/store/home/adhamija/anaconda2/lib/./libgcc_s.so.1 (0x00007fcb09dfd000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fcb09bf9000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fcb099dc000)
    libopencv_imgcodecs.so.3.3 => /net/home/store/home/adhamija/anaconda2/lib/./libopencv_imgcodecs.so.3.3 (0x00007fcb092d7000)
    libopencv_imgproc.so.3.3 => /net/home/store/home/adhamija/anaconda2/lib/./libopencv_imgproc.so.3.3 (0x00007fcb06636000)
    libopencv_core.so.3.3 => /net/home/store/home/adhamija/anaconda2/lib/./libopencv_core.so.3.3 (0x00007fcb057b6000)
    libstdc++.so.6 => /net/home/store/home/adhamija/anaconda2/lib/./libstdc++.so.6 (0x00007fcb0547c000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fcb05173000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcb04da9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fcb0b9a2000)
    libz.so.1 => /net/home/store/home/adhamija/anaconda2/lib/././libz.so.1 (0x00007fcb04b92000)
    libjpeg.so.9 => /net/home/store/home/adhamija/anaconda2/lib/././libjpeg.so.9 (0x00007fcb04956000)
    libpng16.so.16 => /net/home/store/home/adhamija/anaconda2/lib/././libpng16.so.16 (0x00007fcb0471f000)
    libtiff.so.5 => /net/home/store/home/adhamija/anaconda2/lib/././libtiff.so.5 (0x00007fcb044a1000)
    libjasper.so.1 => /net/home/store/home/adhamija/anaconda2/lib/././libjasper.so.1 (0x00007fcb0422f000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fcb04027000)
    libgomp.so.1 => /net/home/store/home/adhamija/anaconda2/lib/././libgomp.so.1 (0x00007fcb03e04000)
    liblzma.so.5 => /net/home/store/home/adhamija/anaconda2/lib/./././liblzma.so.5 (0x00007fcb03bde000)
adhamija@r2-d8:~$ ldd /net/home/store/home/adhamija/anaconda2/pkgs/caffe2-cuda9.0-cudnn7-0.8.dev-py27h4e2c0f2_0/lib/libcaffe2.so
    linux-vdso.so.1 =>  (0x00007ffc3d95b000)
    libprotobuf.so.14 => /home/adhamija/anaconda2/lib/libprotobuf.so.14 (0x00007fc9db5d7000)
    libgflags.so.2.2 => /home/adhamija/anaconda2/lib/libgflags.so.2.2 (0x00007fc9db3b2000)
    libglog.so.0 => /home/adhamija/anaconda2/lib/libglog.so.0 (0x00007fc9db181000)
    liblmdb.so => /home/adhamija/anaconda2/lib/liblmdb.so (0x00007fc9daf6c000)
    libgcc_s.so.1 => /home/adhamija/anaconda2/lib/libgcc_s.so.1 (0x00007fc9dad5a000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc9dab56000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc9da939000)
    libopencv_imgcodecs.so.3.3 => /home/adhamija/anaconda2/lib/libopencv_imgcodecs.so.3.3 (0x00007fc9da234000)
    libopencv_imgproc.so.3.3 => /home/adhamija/anaconda2/lib/libopencv_imgproc.so.3.3 (0x00007fc9d7593000)
    libopencv_core.so.3.3 => /home/adhamija/anaconda2/lib/libopencv_core.so.3.3 (0x00007fc9d6713000)
    libstdc++.so.6 => /home/adhamija/anaconda2/lib/libstdc++.so.6 (0x00007fc9d63d9000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc9d60d0000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc9d5d06000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fc9dc8ff000)
    libz.so.1 => /home/adhamija/anaconda2/lib/./libz.so.1 (0x00007fc9d5aef000)
    libjpeg.so.9 => /home/adhamija/anaconda2/lib/./libjpeg.so.9 (0x00007fc9d58b3000)
    libpng16.so.16 => /home/adhamija/anaconda2/lib/./libpng16.so.16 (0x00007fc9d567c000)
    libtiff.so.5 => /home/adhamija/anaconda2/lib/./libtiff.so.5 (0x00007fc9d53fe000)
    libjasper.so.1 => /home/adhamija/anaconda2/lib/./libjasper.so.1 (0x00007fc9d518c000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc9d4f84000)
    libgomp.so.1 => /home/adhamija/anaconda2/lib/./libgomp.so.1 (0x00007fc9d4d61000)
    liblzma.so.5 => /home/adhamija/anaconda2/lib/././liblzma.so.5 (0x00007fc9d4b3b000)
pjh5 commented 6 years ago

@akshay-raj-dhamija sorry I actually mean to run ldd on libcaffe2_gpu.so . Can you show me the output of that command? The one under anaconda2/lib should be fine. The one in anaconda2/pkgs isn't actually activate and shouldn't matter.

It looks like this might be a subtle bug in our cmake setup. Caffe2 seems to be linking against libcurand version 9.1 specifically. You have CUDA version 9.1 specifically, but you have a stubs folder full of symlinks which point to the most recent CUDA version; this allows you to install a different CUDA version under /lib64 and then only have to change the symlinks under stubs/ to change the CUDA version used by everything else on your machine. In our case though, we want to use 9.1 specifically instead of whatever that symlink under stubs/ happens to point to. I think you can fix it by adding another symlink, like ln -s /usr/local/cuda-9.1/lib64/libcurand.so.9.1 /usr/local/cuda-9.1/lib64/stubs/libcurand.so.9.1 for every libcuda* library that it complains for.

sotstas commented 6 years ago

Hello, I have the same problem, but it seems that it's a problem of the caffe2 version I installed. After running ldd, I get this output: ldd /home/sotiris/anaconda2/envs/caffe2_env/lib/libcaffe2_gpu.so linux-vdso.so.1 => (0x00007ffc5ffb9000) /lib/$LIB/liblsp.so => /lib/lib/x86_64-linux-gnu/liblsp.so (0x00007fc5c74dc000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc5c72d8000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc5c70d0000) libcaffe2.so => /home/sotiris/anaconda2/envs/caffe2_env/lib/./libcaffe2.so (0x00007fc5c6203000) libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fc5c5663000) libcurand.so.9.0 => not found libcudnn.so.7 => /usr/local/cuda-9.1/lib64/libcudnn.so.7 (0x00007fc5b2614000) libnvrtc.so.9.0 => not found libprotobuf.so.14 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./libprotobuf.so.14 (0x00007fc5b21b9000) libgflags.so.2.2 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./libgflags.so.2.2 (0x00007fc5b1f94000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc5b1d77000) libglog.so.0 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./libglog.so.0 (0x00007fc5b1b46000) libcublas.so.9.0 => not found libnccl.so.2 => /usr/lib/x86_64-linux-gnu/libnccl.so.2 (0x00007fc5a42a5000) libstdc++.so.6 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./libstdc++.so.6 (0x00007fc5a3f6b000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc5a3c62000) libgcc_s.so.1 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./libgcc_s.so.1 (0x00007fc5a3a50000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc5a3686000) /lib64/ld-linux-x86-64.so.2 (0x00007fc5c8ae1000) liblmdb.so => /home/sotiris/anaconda2/envs/caffe2_env/lib/././liblmdb.so (0x00007fc5a3471000) libopencv_imgcodecs.so.3.3 => /home/sotiris/anaconda2/envs/caffe2_env/lib/././libopencv_imgcodecs.so.3.3 (0x00007fc5a2d6c000) libopencv_imgproc.so.3.3 => /home/sotiris/anaconda2/envs/caffe2_env/lib/././libopencv_imgproc.so.3.3 (0x00007fc5a00cb000) libopencv_core.so.3.3 => /home/sotiris/anaconda2/envs/caffe2_env/lib/././libopencv_core.so.3.3 (0x00007fc59f24b000) libnvidia-fatbinaryloader.so.390.25 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.25 (0x00007fc59efff000) libz.so.1 => /home/sotiris/anaconda2/envs/caffe2_env/lib/././libz.so.1 (0x00007fc59ede8000) libjpeg.so.9 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./././libjpeg.so.9 (0x00007fc59ebac000) libpng16.so.16 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./././libpng16.so.16 (0x00007fc59e975000) libtiff.so.5 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./././libtiff.so.5 (0x00007fc59e6f7000) libjasper.so.1 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./././libjasper.so.1 (0x00007fc59e485000) libgomp.so.1 => /home/sotiris/anaconda2/envs/caffe2_env/lib/./././libgomp.so.1 (0x00007fc59e262000) liblzma.so.5 => /home/sotiris/anaconda2/envs/caffe2_env/lib/././././liblzma.so.5 (0x00007fc59e03c000)

If I am getting this right, the caffe2 version I have installed is looking for /libcurand.so.9.0, but I have /libcurand.so.9.1 as verified by: find /usr -name libcurand.so /usr/local/cuda-9.1/lib64/libcurand.so /usr/local/cuda-9.1/lib64/stubs/libcurand.so

I tried linking the 9.1 library, but it won't work, I guess I need to install the cuda9.1 version of caffe2. Where can this be found? On the project site you only have caffe2-cuda9.0-cudnn7 and with conda I cannot seem to find a 9.1 version.

Thanks

sotstas commented 6 years ago

If that is not the issue and the caffe2-cuda9.0-cudnn7 works with cuda9.1, then maybe something else is needed. I linked the libraries ln -s /usr/local/cuda-9.1/lib64/libcurand.so.9.1 /usr/local/cuda-9.1/lib64/stubs/libcurand.so.9.1

but it didn't resolve the problem. I get the same error message: python char_rnn.py --train_data shakespeare.txtWARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: libcurand.so.9.0: cannot open shared object file: No such file or directory Segmentation fault (core dumped)

pjh5 commented 6 years ago

@sotstas I haven't made a caffe2-cuda9.1-cudnn7 package yet. I can add one, but it won't be up until tomorrow at the earliest. The caffe2-cuda9.0-cudnn7 package will only work for CUDA 9.0 exactly. In the meantime, you can build from source with conda build conda/cuda conda install caffe2-cuda --use-local

sotstas commented 6 years ago

I had errors while building from source:

conda build conda/cuda Adding in variants from internal_defaults INFO:conda_build.variants:Adding in variants from internal_defaults Adding in variants from /home/sotiris/libraries_extras/NN related/caffe2-master/conda/cuda/conda_build_config.yaml INFO:conda_build.variants:Adding in variants from /home/sotiris/libraries_extras/NN related/caffe2-master/conda/cuda/conda_build_config.yaml /home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/environ.py:369: UserWarning: The environment variable 'CONDA_CMAKE_ARGS' is undefined. UserWarning Attempting to finalize metadata for caffe2-cuda INFO:conda_build.metadata:Attempting to finalize metadata for caffe2-cuda Solving environment: ...working... done Solving environment: ...working... done BUILD START: [u'caffe2-cuda-0.8.dev-py27h71f975e_0.tar.bz2'] Solving environment: ...working... done Solving environment: ...working... done

Package Plan

environment location: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol

The following NEW packages will be INSTALLED:

bzip2:           1.0.6-h9a117a8_4     
ca-certificates: 2017.08.26-h1d4fec5_0
cairo:           1.14.12-h77bcde2_0   
certifi:         2018.1.18-py27_0     
cmake:           3.9.4-h142f0e9_0     
curl:            7.58.0-h84994c4_0    
expat:           2.2.5-he0dffb1_0     
ffmpeg:          3.4-h7264315_0       
fontconfig:      2.12.6-h49f89f6_0    
freetype:        2.8-hab7d2ae_1       
future:          0.16.0-py27_1        
gflags:          2.2.1-hf484d3e_0     
glib:            2.53.6-h5d9569c_2    
glog:            0.3.5-hf484d3e_1     
graphite2:       1.3.10-hf63cedd_1    
harfbuzz:        1.7.4-hc5b324e_0     
hdf5:            1.10.1-h9caa474_1    
icu:             58.2-h9c2bf20_1      
intel-openmp:    2018.0.0-hc7b2577_8  
jasper:          1.900.1-hd497a04_4   
jpeg:            9b-h024ee3a_2        
libcurl:         7.58.0-h1ad7b7a_0    
libedit:         3.1-heed3624_0       
libffi:          3.2.1-hd88cf55_4     
libgcc-ng:       7.2.0-hdf63c60_3     
libgfortran-ng:  7.2.0-hdf63c60_3     
libopus:         1.2.1-hb9ed12e_0     
libpng:          1.6.34-hb9fc6fc_0    
libprotobuf:     3.4.1-h5b8497f_0     
libssh2:         1.8.0-h9cfc8f7_4     
libstdcxx-ng:    7.2.0-hdf63c60_3     
libtiff:         4.0.9-h28f6b97_0     
libuv:           1.19.2-h14c3975_0    
libvpx:          1.6.1-h888fd40_0     
libxcb:          1.12-hcd93eb1_4      
libxml2:         2.9.7-h26e45fe_0     
lmdb:            0.9.21-hf484d3e_1    
mkl:             2018.0.1-h19d6760_4  
ncurses:         6.0-h9df7e31_2       
numpy:           1.11.3-py27h3dfced4_4
opencv:          3.3.1-py27h6cbbc71_1 
openssl:         1.0.2n-hb7f436b_0    
pcre:            8.41-hc27e229_1      
pip:             9.0.1-py27_5         
pixman:          0.34.0-hceecf20_3    
protobuf:        3.4.1-py27h2ba6a9c_0 
python:          2.7.14-h1571d57_29   
readline:        7.0-ha6073c6_4       
rhash:           1.3.5-hbf7ad62_1     
setuptools:      38.5.1-py27_0        
six:             1.11.0-py27h5f960f1_1
sqlite:          3.22.0-h1bed415_0    
tk:              8.6.7-hc745277_3     
wheel:           0.30.0-py27h2bc6bb2_1
xz:              5.2.3-h55aa19d_2     
zlib:            1.2.11-ha838bed_2    

Preparing transaction: ...working... done Verifying transaction: ...working... done Executing transaction: ...working... done Copying /home/sotiris/libraries_extras/NN related/caffe2-master to /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work source tree in: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work

CMake Error at cmake/External/nnpack.cmake:78 (set_property): set_property could not find TARGET nnpack. Perhaps it has not yet been created. Call Stack (most recent call first): cmake/Dependencies.cmake:71 (include) CMakeLists.txt:101 (include)

CMake Error at cmake/External/nnpack.cmake:79 (set_property): set_property could not find TARGET pthreadpool. Perhaps it has not yet been created. Call Stack (most recent call first): cmake/Dependencies.cmake:71 (include) CMakeLists.txt:101 (include)

CMake Error at cmake/External/nnpack.cmake:80 (set_property): set_property could not find TARGET cpuinfo. Perhaps it has not yet been created. Call Stack (most recent call first): cmake/Dependencies.cmake:71 (include) CMakeLists.txt:101 (include)

CMake Error at cmake/Dependencies.cmake:97 (add_subdirectory): The source directory

/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/third_party/cpuinfo

does not contain a CMakeLists.txt file. Call Stack (most recent call first): CMakeLists.txt:101 (include)

CMake Error at cmake/Dependencies.cmake:102 (set_property): set_property could not find TARGET cpuinfo. Perhaps it has not yet been created. Call Stack (most recent call first): CMakeLists.txt:101 (include)

-- Caffe2: Found gflags with new-style gflags target. -- Caffe2: Cannot find glog automatically. Using legacy find. -- Found glog: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include
-- Caffe2: Found glog (include: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include, library: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libglog.so) CMake Error at cmake/Dependencies.cmake:152 (add_subdirectory): The source directory

/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/third_party/googletest

does not contain a CMakeLists.txt file. Call Stack (most recent call first): CMakeLists.txt:101 (include)

CMake Error at cmake/Dependencies.cmake:159 (add_subdirectory): The source directory

/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/third_party/benchmark

does not contain a CMakeLists.txt file. Call Stack (most recent call first): CMakeLists.txt:101 (include)

-- Found LMDB: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include
-- Found lmdb (include: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include, library: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/liblmdb.so) -- Found Numa: /usr/include
-- Found Numa (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnuma.so) -- OpenCV found (/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share/OpenCV) -- Found system Eigen at /usr/include/eigen3 -- Found PythonInterp: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/bin/python2.7 (found suitable version "2.7.14", minimum required is "2.7") -- Found PythonLibs: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libpython2.7.so (found suitable version "2.7.14", minimum required is "2.7") -- Found NumPy: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/python2.7/site-packages/numpy/core/include (found version "1.11.3") -- NumPy ver. 1.11.3 found (include: /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/python2.7/site-packages/numpy/core/include) -- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR) -- Found CUDA: /usr/local/cuda-9.1 (found suitable version "9.1", minimum required is "7.0") -- Found CUDNN: /usr/local/cuda-9.1/include
-- Caffe2: CUDA detected: 9.1 -- Found cuDNN: v7.1.1 (include: /usr/local/cuda-9.1/include, library: /usr/local/cuda-9.1/lib64/libcudnn.so) -- Automatic GPU detection returned 5.2. -- Added CUDA NVCC flags for: sm_52 -- Found NCCL: /usr/include
-- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl.so) -- Could NOT find CUB (missing: CUB_INCLUDE_DIR) -- Could NOT find Gloo (missing: Gloo_INCLUDE_DIR Gloo_LIBRARY) CMake Error at cmake/Dependencies.cmake:417 (add_subdirectory): The source directory

/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/third_party/gloo

does not contain a CMakeLists.txt file. Call Stack (most recent call first): CMakeLists.txt:101 (include)

CMake Warning at cmake/Dependencies.cmake:457 (message): mobile opengl is only used in android or ios builds. Call Stack (most recent call first): CMakeLists.txt:101 (include)

CMake Warning at cmake/Dependencies.cmake:533 (message): Metal is only used in ios builds. Call Stack (most recent call first): CMakeLists.txt:101 (include)

CMake Error at cmake/Dependencies.cmake:566 (add_subdirectory): The source directory

/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/third_party/onnx

does not contain a CMakeLists.txt file. Call Stack (most recent call first): CMakeLists.txt:101 (include)

CMake Error at cmake/public/utils.cmake:7 (get_target_property): get_target_property() called with non-existent target "onnx". Call Stack (most recent call first): cmake/Dependencies.cmake:569 (caffe2_interface_library) CMakeLists.txt:101 (include)

-- GCC 5.4.0: Adding gcc and gcc_s libs to link line -- Include NCCL operators -- Including image processing operators -- Excluding video processing operators due to no opencv -- Excluding mkl operators as we are not using mkl -- MPI operators skipped due to no MPI support -- Include Observer library -- Using lib/python2.7/site-packages as python relative installation path -- Automatically generating missing init.py files. -- -- **** Summary **** -- General: -- CMake version : 3.9.4 -- CMake command : /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/bin/cmake -- Git version : unknown -- System : Linux -- C++ compiler : /usr/bin/c++ -- C++ compiler version : 5.4.0 -- Protobuf compiler : /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/bin/protoc -- Protobuf include path : /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include -- Protobuf libraries : /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/libprotobuf.so;-lpthread -- BLAS : Eigen -- CXX flags : -Wno-deprecated -DONNX_NAMESPACE=onnx_c2 -O2 -fPIC -Wno-narrowing -Wno-invalid-partial-specialization -- Build type : Release -- Compile definitions : -- -- BUILD_BINARY : ON -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 2.7.14 -- Python includes : /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/python2.7 -- BUILD_SHARED_LIBS : ON -- BUILD_TEST : ON -- USE_ATEN : OFF -- USE_ASAN : OFF -- USE_CUDA : ON -- CUDA version : 9.1 -- CuDNN version : 7.1.1 -- CUDA root directory : /usr/local/cuda-9.1 -- CUDA library : /usr/lib/x86_64-linux-gnu/libcuda.so -- CUDA NVRTC library : /usr/local/cuda-9.1/lib64/libnvrtc.so -- CUDA runtime library: /usr/local/cuda-9.1/lib64/libcudart.so -- CUDA include path : /usr/local/cuda-9.1/include -- NVCC executable : /usr/local/cuda-9.1/bin/nvcc -- CUDA host compiler : /usr/bin/cc -- USE_EIGEN_FOR_BLAS : 1 -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_GLOO : ON -- USE_LEVELDB : OFF -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.21 -- USE_METAL : OFF -- USE_MKL : -- USE_MOBILE_OPENGL : OFF -- USE_MPI : OFF -- USE_NCCL : ON -- USE_NERVANA_GPU : OFF -- USE_NNPACK : ON -- USE_OBSERVERS : ON -- USE_OPENCV : ON -- OpenCV version : 3.3.1 -- USE_OPENMP : OFF -- USE_PROF : OFF -- USE_REDIS : OFF -- USE_ROCKSDB : OFF -- USE_THREADS : ON -- USE_ZMQ : OFF -- Configuring incomplete, errors occurred! See also "/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/build/CMakeFiles/CMakeOutput.log". See also "/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/build/CMakeFiles/CMakeError.log". Traceback (most recent call last): File "/home/sotiris/anaconda2/bin/conda-build", line 11, in sys.exit(main()) File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/cli/main_build.py", line 413, in main execute(sys.argv[1:]) File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/cli/main_build.py", line 404, in execute verify=args.verify) File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/api.py", line 193, in build need_source_download=need_source_download, config=config, variants=variants) File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/build.py", line 1944, in build_tree notest=notest, File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/build.py", line 1240, in build utils.check_call_env(cmd, env=env, cwd=src_dir) File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/utils.py", line 678, in check_call_env return _func_defaulting_env_to_os_environ(subprocess.check_call, *popenargs, kwargs) File "/home/sotiris/anaconda2/lib/python2.7/site-packages/conda_build/utils.py", line 674, in _func_defaulting_env_to_os_environ return func(_args, kwargs) File "/home/sotiris/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/bin/bash', '-e', '/home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520915194017/work/conda_build.sh']' returned non-zero exit status 1

And then, the "conda install caffe2-cuda --use-local" can't find that package, maybe me repos need an update? `conda install caffe2-cuda --use-local Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

Current channels:

I think I may wait for your caffe2-cuda9.1-cudnn7 package

pjh5 commented 6 years ago

@sotstas the 9.1 package is coming. All those errors you got can be solved with git submodule update ; I think you forgot to clone caffe2 with the --recursive flag

sotstas commented 6 years ago

After git submodule update and git clone https://github.com/caffe2/caffe2.git --recursive I managed to move forward with the building process a bit. It starts to build and gets to about 33%, and then crashes due to an Eigen library related error: ` [ 33%] Building CXX object caffe2/CMakeFiles/caffe2.dir/operators/conv_transpose_gradient_op.cc.o /home/sotiris/anaconda2/conda-bld/caffe2-cuda_1520997113826/work/caffe2/operators/conv_op_eigen.cc:24:2: error: #error "Caffe2 requires Eigen to be at least 3.3.0.";

error "Caffe2 requires Eigen to be at least 3.3.0.";

^ [ 34%] Building CXX object caffe2/CMakeFiles/caffe2.dir/operators/conv_transpose_op.cc.o [ 34%] Building CXX object caffe2/CMakeFiles/caffe2.dir/operators/conv_transpose_op_mobile.cc.o [ 34%] Building CXX object caffe2/CMakeFiles/caffe2.dir/operators/cos_op.cc.o [ 34%] Building CXX object caffe2/CMakeFiles/caffe2.dir/operators/cosine_embedding_criterion_op.cc.o cc1plus: warning: unrecognized command line option ‘-Wno-invalid-partial-specialization’ caffe2/CMakeFiles/caffe2.dir/build.make:2126: recipe for target 'caffe2/CMakeFiles/caffe2.dir/operators/conv_op_eigen.cc.o' failed make[2]: [caffe2/CMakeFiles/caffe2.dir/operators/conv_op_eigen.cc.o] Error 1 make[2]: Waiting for unfinished jobs.... CMakeFiles/Makefile2:2817: recipe for target 'caffe2/CMakeFiles/caffe2.dir/all' failed make[1]: *** [caffe2/CMakeFiles/caffe2.dir/all] Error 2 Makefile:140: recipe for target 'all' failed `

However, I have the latest version: libeigen3-dev is already the newest version (3.3~beta1-2).

I saw that when cloning, you include your own version of Eigen in third-parties, so mine should not be relevant to the problem, correct?

pjh5 commented 6 years ago

@sotstas the Eigen in third-parties is used as a backup if it can't find an Eigen on your machine. There will be a string in your cmake output "Did not find system Eigen. Using third party subdirectory." if it is using the eigen in third-party, and "Found system Eigen at " otherwise. Where is your libeigen? If you pull the latest source, then this should be using MKL instead of Eigen anyways, and shouldn't come across this problem.

If no more problems arise on my end, the 9.1 packages should be up by tomorrow

sotstas commented 6 years ago

@pjh5 any news about the cuda 9.1 packages?

pjh5 commented 6 years ago

Sorry about forgetting to update here. I ran into a problem that I did not understand and am trying to push out gcc4.8 builds first.

If you're curious, the error is docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:398: container init caused \"process_linux.go:381: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.1 --pid=8808 /data/docker/overlay2/40f5c64fe2faec250276e825846b620f876c693017ef9036a049f3cfd98f1626/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: cuda >= 9.1\\\\n\\\"\"": unknown. when trying to run the docker image to build the package.

I don't have an estimate on the timeline of this at the moment.

sotstas commented 6 years ago

Unfortunately I'm not very familiar with package building either... If you have any updates please let me know. I may try to go back to installing CUDA 9.0 and reinstall everything...

ghost commented 6 years ago

Maybe, it's a bit late for you all guys.

Here's my configuration:

I managed to bypass the problem by copying manually

I don't know if this bypass could hold for CUDA 9.0 in an indefinite manner, or for CUDA 9.1. Anyway, this bypass eliminates the 'CPU Only' message. Let me know if there is a question.