dnouri / cuda-convnet

My fork of Alex Krizhevsky's cuda-convnet from 2013 where I added dropout, among other features.
http://code.google.com/p/cuda-convnet/
253 stars 147 forks source link

Only make convent a shared lib #8

Closed kashif closed 10 years ago

kashif commented 10 years ago

fix shared lib. not found error. @invisibleroads can you check this? thanks!

invisibleroads commented 10 years ago

Hmm, here is what I get.

[ 5%] Built target common [ 17%] Built target nvmatrix [ 41%] Built target cudaconv2 Linking CXX shared library convnet.so /bin/ld: src/common/libcommon.a(matrix.cpp.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC src/common/libcommon.a: could not read symbols: Bad value collect2: error: ld returned 1 exit status make[2]: * [convnet.so] Error 1 make[1]: * [CMakeFiles/convnet.dir/all] Error 2 make: *\ [all] Error 2

kashif commented 10 years ago

ok cool I thought the list(APPEND CUDA_NVCC_FLAGS -Xcompiler -fpic) would do that... let me test it on my linux box

kashif commented 10 years ago

@invisibleroads opps needed to add the -fPIC flag to the cpp compiler. Can you try now?

invisibleroads commented 10 years ago

@kashif, thanks for the pointers. I was able to compile your fork after making the modification below, based on this hint.

IF(${CMAKE_SYSTEM_NAME} MATCHES "Linux") list(APPEND CMAKE_CXX_FLAGS -fPIC) ENDIF(${CMAKE_SYSTEM_NAME} MATCHES "Linux")

However, I now get the following error.

import convnet ImportError: ./convnet.so: undefined symbol: cblas_sgemm

kashif commented 10 years ago

I see, can you do: ldd convnet.so and let me know what you get?

invisibleroads commented 10 years ago

$ ldd convnet.so

linux-vdso.so.1 =>  (0x00007fff43f8e000)
libcudart.so.6.0 => /usr/local/cuda/lib64/libcudart.so.6.0 (0x00007fed86410000)
libpython2.7.so.1.0 => /lib64/libpython2.7.so.1.0 (0x00007fed8602d000)
libblas.so.3 => /lib64/libblas.so.3 (0x00007fed85dd5000)
libcublas.so.6.0 => /usr/local/cuda/lib64/libcublas.so.6.0 (0x00007fed843fd000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fed840f4000)
libm.so.6 => /lib64/libm.so.6 (0x00007fed83ded000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fed83bd7000)
libc.so.6 => /lib64/libc.so.6 (0x00007fed83817000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fed83613000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fed833f6000)
librt.so.1 => /lib64/librt.so.1 (0x00007fed831ed000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fed82fea000)
libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007fed82cc9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fed8763e000)
libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007fed82a8c000)
kashif commented 10 years ago

and are you using atlas or just the standard blas libraries on fedora?

invisibleroads commented 10 years ago

I'm not sure. The following build.sh worked before without cmake.

CUDA toolkit installation directory.

export CUDA_INSTALL_PATH=/usr/local/cuda

CUDA SDK installation directory.

export CUDA_SDK_PATH=$CUDA_INSTALL_PATH

Python include directory. This should contain the file Python.h, among others.

export PYTHON_INCLUDE_PATH=$VIRTUAL_ENV/include/python2.7

Numpy include directory. This should contain the file arrayobject.h, among others.

export NUMPY_INCLUDE_PATH=$VIRTUAL_ENV/lib/python2.7/site-packages/numpy/core/include/numpy/

ATLAS library directory. This should contain the file libcblas.so, among others.

export ATLAS_LIB_PATH=/usr/lib64/atlas

make $*

invisibleroads commented 10 years ago

Is it significant that in ldd convnet.so, most of the library paths are /lib64 instead of /usr/lib64?

e.g. OLD libblas.so.3 => /lib64/libblas.so.3 (0x00007fed85dd5000) NEW libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fed85dd5000)

kashif commented 10 years ago

no thats fine.. I think the issue is that cmake is not linking against cblas.so which is needed when atlas is used. So fixing that now...

kashif commented 10 years ago

ok can you try now with these changes: https://github.com/dnouri/cuda-convnet/pull/8/files

thanks! kashif

invisibleroads commented 10 years ago
$ make
[  5%] Built target common
[ 17%] Built target nvmatrix
[ 41%] Built target cudaconv2
Linking CXX shared library convnet.so
/bin/ld: cannot find -lcblas
/bin/ld: cannot find -lcblas
collect2: error: ld returned 1 exit status
make[2]: *** [convnet.so] Error 1
make[1]: *** [CMakeFiles/convnet.dir/all] Error 2
make: *** [all] Error 2

$ find /usr -name *cblas*
/usr/lib64/python2.7/site-packages/scipy/linalg/cblas.pyo
/usr/lib64/python2.7/site-packages/scipy/linalg/cblas.py
/usr/lib64/python2.7/site-packages/scipy/linalg/cblas.pyc
/usr/lib64/python2.7/site-packages/scipy/linalg/_cblas.so
/usr/lib64/python2.7/site-packages/scipy/lib/blas/cblas.so
/usr/lib64/atlas/libptcblas.so.3.0
/usr/lib64/atlas/libptcblas.so.3
/usr/lib64/atlas/libcblas.so.3.0
/usr/lib64/atlas/libcblas.so.3
/usr/lib64/atlas/libcblas.so
/usr/lib64/atlas/libptcblas.so
/usr/include/cblas.h
/usr/share/doc/atlas-devel/doc/cblasqref.pdf
/usr/share/doc/atlas-devel/doc/cblas.pdf

$ env | grep PATH
LD_LIBRARY_PATH=/home/rhh/.virtualenvs/crosscompute/lib:/usr/local/cuda/lib64
PATH=/home/rhh/.virtualenvs/crosscompute/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rhh/.scripts:/home/rhh/bin:/home/rhh/.scripts:/home/rhh/bin:/usr/local/cuda/bin
NODE_PATH=/home/rhh/.virtualenvs/crosscompute/lib/node_modules
kashif commented 10 years ago

Ok I see, can you pull now and try with:

cmake -DBLAS_LIBRARIES=/usr/lib64/atlas/libcblas.so  .
make
invisibleroads commented 10 years ago

Aha, success! import convnet works now.

invisibleroads commented 10 years ago

Thank you for solving this puzzle.

kashif commented 10 years ago

i thank you :smile:

kashif commented 10 years ago

thank @dnouri :+1:

invisibleroads commented 10 years ago

@kashif, is there a way to expose the ConvNet class in convnet.so?

The convnet.so module seems to hide the convnet.py module and the following line in noccn no longer works:

from convnet import ConvNet

invisibleroads commented 10 years ago

I was able to get this working by renaming convnet.py to something else like convnet_.py

from convnet_ import ConvNet