Closed skhushu closed 6 years ago
i run into the same error. Have you found a resolution?
me too, if you find a solution, please tell me ,thanks
I removed openmpi ( including libopenmpi-dev, openmpi-bin and openmpi-doc) in my ubuntu ,clean , make ,install .Then it works fine. use sudo apt remove \ libopenmpi-dev\ openmpi-bin \ openmpi-doc
it doesn't work in my computer
delete the folder 'build' and retry
i did ,and the error is still there /usr/bin/ld: CMakeFiles/mpi_test.dir/mpi/mpi_test.cc.o: undefined reference to symbol '_ZN3MPI8Datatype4FreeEv' //usr/lib/libmpi_cxx.so.1: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status caffe2/CMakeFiles/mpi_test.dir/build.make:100: recipe for target 'bin/mpi_test' failed make[2]: [bin/mpi_test] Error 1 CMakeFiles/Makefile2:2614: recipe for target 'caffe2/CMakeFiles/mpi_test.dir/all' failed make[1]: [caffe2/CMakeFiles/mpi_test.dir/all] Error 2 Makefile:138: recipe for target 'all' failed make: *** [all] Error 2
Sorry, I forgot. You should also change 'option(USE_MPI "Use MPI" ON)' to'option(USE_MPI "Use MPI" OFF)' in the file 'CMakeLists.txt'
yes, it works, thank you!
you're welcome : )
MPI is not essential for only GPU mode?
It's optional
@yourlovedu hello, i used your method today.
but,something was wrong when i test the caffe2 as the installation guide.
i found that there was no gpu support,and then i checked the cmake summay.
it looked like cudnn support was not installed properly,and i make things right and complie again with openmpi .
finally,caffe2 can pass the test perfectly.:)
i just wonder this problem caused by the wrong cudnn installtation rather than the openmpi?
thanks a lot !!
@zkself Actually, I did the same steps as you did. After removing openmpi ,the caffe can be successfully installed. However the gpu number that the test gave is 0. At the beginning I thought that caffe2 can work with only cuda, and without cudnn. But after running the test and seeing the number 0, I thought cudnn may be essential. So I installed cudnn and reinstalled caffe2 ,then the number detected by test is 1. Based on the search on the internet ,here is my idea: Firstly ,the openmpi error is because of ubuntu16 (the error doesn't happen on ubuntu 14). But as mpi is not essential, we can just remove it .Secondly, caffe2 needs both cuda and cudnn for gpu detection. Actually I'm wondering whether caffe2 can work without cudnn. Because the capability of my gpu is 2.1. T-T
@yourlovedu
thanks for your idea and help !!!
by the way, are you chinese?
becasue my english is pool and you can understand my words.
@zkself 哈哈哈,俩中国文用英文交流了半天233333
@yourlovedu 别扭半天23333,不过为了外国友人也是不错的。 老哥能加个QQ/微信吗 我主页有我邮箱
Thank you everyone for helping sort this issue. I tried installing Caffe2 on AWS EC2 using Deep Learning AMI which has CUDA and CuDNN pre-installed. It worked fine without any compile issues or errors. I didn't have to disable MPI feature. Looks like the original error may have been a CuDNN issue or just a Caffe2-CuDNN compatibilty mismatch. Unless anyone else has anything else to add to it, I am going to close this thread later today. Thanks!
To add a little more to this, Caffe2 with GPU does require both CUDA and CuDNN, but does not require MPI. MPI is used for training across multiple machines, such as in a cluster environment. Thanks everyone for helping each other out! I will close this now as everything is resolved.
I just got the same error building on Ubuntu 17.10 with USE_CUDA : OFF
I notice the issue has been closed. Is there really nothing that can be done to improve this or better explain the configuration, instead of letting it break with a link error? (This is one of many build issues I've had with Caffe2. A focus on improving build behaviour wouldn't be amiss.)
This was a successful workaround:
# cmake .. -DUSE_CUDA=OFF -DUSE_MPI=OFF
root@w8pc:~/caffe2/build# cmake .. -DUSE_CUDA=OFF
-- Does not need to define long separately.
-- std::exception_ptr is supported.
-- NUMA is available
-- Current compiler supports avx2 extention. Will build perfkernels.
-- Caffe2: Found protobuf with old-style protobuf targets.
-- Caffe2 protobuf include directory: /usr/include
-- The BLAS backend of choice:Eigen
-- Could NOT find NNPACK (missing: NNPACK_INCLUDE_DIR NNPACK_LIBRARY PTHREADPOOL_LIBRARY CPUINFO_LIBRARY)
-- Brace yourself, we are building NNPACK
-- Found PythonInterp: /usr/bin/python (found version "2.7.14")
-- Caffe2: Found gflags with new-style gflags target.
-- Caffe2: Cannot find glog automatically. Using legacy find.
-- Caffe2: Found glog (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
-- git Version: v0.0.0
-- Version: 0.0.0
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Found lmdb (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
-- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
-- Found Snappy (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libsnappy.so)
-- Found Numa (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnuma.so)
-- OpenCV found (/usr/share/OpenCV)
CMake Warning at cmake/Dependencies.cmake:256 (find_package):
By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Eigen3", but
CMake did not find one.
Could not find a package configuration file provided by "Eigen3" with any
of the following names:
Eigen3Config.cmake
eigen3-config.cmake
Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set
"Eigen3_DIR" to a directory containing one of the above files. If "Eigen3"
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
CMakeLists.txt:101 (include)
-- Did not find system Eigen. Using third party subdirectory.
-- Found PythonInterp: /usr/bin/python (found suitable version "2.7.14", minimum required is "2.7")
-- NumPy ver. 1.14.1 found (include: /usr/local/lib/python2.7/dist-packages/numpy/core/include)
-- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR)
-- MPI support found
-- MPI compile flags:
-- MPI include path: /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/x86_64-linux-gnu/openmpi/include
-- MPI LINK flags path:
-- MPI libraries: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
CMake Warning at cmake/Dependencies.cmake:310 (message):
OpenMPI found, but it is not built with CUDA support.
Call Stack (most recent call first):
CMakeLists.txt:101 (include)
CMake Warning at cmake/Dependencies.cmake:360 (message):
If not using cuda, one should not use NCCL either.
Call Stack (most recent call first):
CMakeLists.txt:101 (include)
-- Could NOT find Gloo (missing: Gloo_INCLUDE_DIR Gloo_LIBRARY)
-- MPI include path: /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/x86_64-linux-gnu/openmpi/include
-- MPI libraries: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
CMake Warning at cmake/Dependencies.cmake:443 (message):
mobile opengl is only used in android or ios builds.
Call Stack (most recent call first):
CMakeLists.txt:101 (include)
CMake Warning at cmake/Dependencies.cmake:519 (message):
Metal is only used in ios builds.
Call Stack (most recent call first):
CMakeLists.txt:101 (include)
-- GCC 7.2.0: Adding gcc and gcc_s libs to link line
-- NCCL operators skipped due to no CUDA support
-- CUDA RTC operators skipped due to no CUDA support
-- Including image processing operators
-- Excluding video processing operators due to no opencv
-- Excluding mkl operators as we are not using mkl
-- Include Observer library
-- Using lib/python2.7/dist-packages as python relative installation path
-- Automatically generating missing __init__.py files.
--
-- ******** Summary ********
-- General:
-- CMake version : 3.9.1
-- CMake command : /usr/bin/cmake
-- Git version : v0.8.1-1294-gb3e093613
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- C++ compiler version : 7.2.0
-- Protobuf compiler : /usr/bin/protoc
-- Protobuf include path : /usr/include
-- Protobuf libraries : /usr/lib/x86_64-linux-gnu/libprotobuf.so;-pthread
-- BLAS : Eigen
-- CXX flags : -O2 -fPIC -Wno-narrowing -Wno-invalid-partial-specialization
-- Build type : Release
-- Compile definitions :
--
-- BUILD_BINARY : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : ON
-- Python version : 2.7.14
-- Python includes : /usr/include/python2.7
-- BUILD_SHARED_LIBS : ON
-- BUILD_TEST : ON
-- USE_ATEN : OFF
-- USE_ASAN : OFF
-- USE_CUDA : OFF
-- USE_EIGEN_FOR_BLAS : 1
-- USE_FFMPEG : OFF
-- USE_GFLAGS : ON
-- USE_GLOG : ON
-- USE_GLOO : ON
-- USE_LEVELDB : ON
-- LevelDB version : 1.20
-- Snappy version : ..
-- USE_LITE_PROTO : OFF
-- USE_LMDB : ON
-- LMDB version : 0.9.21
-- USE_METAL : OFF
-- USE_MKL :
-- USE_MOBILE_OPENGL : OFF
-- USE_MPI : ON
-- USE_NCCL : OFF
-- USE_NERVANA_GPU : OFF
-- USE_NNPACK : ON
-- USE_OBSERVERS : ON
-- USE_OPENCV : ON
-- OpenCV version : 3.1.0
-- USE_OPENMP : OFF
-- USE_PROF : OFF
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_THREADS : ON
-- USE_ZMQ : OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /root/caffe2/build
For anyone new who finds this error, everybody on this thread so far has fixed this issue by passing -DUSE_MPI=OFF to their cmake
command (or by changing https://github.com/caffe2/caffe2/blob/master/CMakeLists.txt#L52 to ON to OFF). This works because MPI is an optional feature that is only used for distributed training across multiple machines, which many people do not need.
If you do need MPI and are running into this problem, then please help me debug by giving me
mpirun --version
and nvidia-smi
, how you installed Caffe2, the full output of your cmake
command, and find /usr -name 'libmpi*'
.
same issue here are some of the test results
jtang@1032:~$ mpirun --version mpirun (Open MPI) 1.10.2
I installed caffe2 according to instructions on caffe2 website: git clone --recursive https://github.com/pytorch/pytorch.git && cd pytorch git submodule update --init mkdir build && cd build cmake .. make install
cmake result:
jtang@1032:~/pytorch/build$ cmake ..
-- Does not need to define long separately.
-- std::exception_ptr is supported.
-- NUMA is available
-- Current compiler supports avx2 extention. Will build perfkernels.
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:/home/jtang/pytorch/third_party/protobuf/src>$
Could not find a package configuration file provided by "Eigen3" with any of the following names:
Eigen3Config.cmake
eigen3-config.cmake
Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set "Eigen3_DIR" to a directory containing one of the above files. If "Eigen3" provides a separate development package or SDK, be sure it has been installed. Call Stack (most recent call first): CMakeLists.txt:188 (include)
-- Did not find system Eigen. Using third party subdirectory. -- Found PythonInterp: /usr/bin/python (found suitable version "2.7.12", minimum required is "2.7") -- NumPy ver. 1.14.3 found (include: /usr/local/lib/python2.7/dist-packages/numpy/core/include) -- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR) -- MPI support found -- MPI compile flags: -- MPI include path: /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include/usr/lib/openmpi/include/usr/lib/openmpi/include/openmpi -- MPI LINK flags path: -Wl,-rpath -Wl,/usr/lib/openmpi/lib -Wl,--enable-new-dtags -- MPI libraries: /usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so CMake Warning at cmake/Dependencies.cmake:364 (message): OpenMPI found, but it is not built with CUDA support. Call Stack (most recent call first): CMakeLists.txt:188 (include)
-- Caffe2: CUDA detected: 8.0 -- Found cuDNN: v5.1.10 (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so) -- Automatic GPU detection returned 6.1 6.1 6.1 6.1 6.1 6.1 6.1 6.1. -- Added CUDA NVCC flags for: sm_61 -- Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES) -- Could NOT find CUB (missing: CUB_INCLUDE_DIR) -- Could NOT find Gloo (missing: Gloo_INCLUDE_DIR Gloo_LIBRARY) -- MPI include path: /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include/usr/lib/openmpi/include/usr/lib/openmpi/include/openmpi -- MPI libraries: /usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so -- CUDA detected: 8.0 -- Found libcuda: /usr/lib/x86_64-linux-gnu/libcuda.so -- Found libnvrtc: /usr/local/cuda/lib64/libnvrtc.so -- GCC 5.4.0: Adding gcc and gcc_s libs to link line -- Include NCCL operators -- Excluding ideep operators as we are not using ideep -- Including image processing operators -- Excluding video processing operators due to no opencv -- Excluding mkl operators as we are not using mkl -- Include Observer library -- Using lib/python2.7/dist-packages as python relative installation path -- Automatically generating missing init.py files. -- A previous caffe2 cmake run already created the init.py files. CMake Warning at CMakeLists.txt:318 (message): Generated cmake files are only fully tested if one builds with system glog, gflags, and protobuf. Other settings may generate files that are not well tested.
-- BUILD_CAFFE2 : ON -- BUILD_ATEN : OFF -- BUILD_BINARY : ON -- BUILD_CUSTOM_PROTOBUF : ON -- Link local protobuf : ON -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 2.7.12 -- Python includes : /usr/include/python2.7 -- BUILD_SHARED_LIBS : ON -- BUILD_TEST : ON -- USE_ASAN : OFF -- USE_CUDA : ON -- CUDA static link : OFF -- USE_CUDNN : ON -- CUDA version : 8.0 -- cuDNN version : 5.1.10 -- CUDA root directory : /usr/local/cuda -- CUDA library : /usr/lib/x86_64-linux-gnu/libcuda.so -- cudart library : /usr/local/cuda/lib64/libcudart_static.a;-pthread;dl;/usr/lib/x86_64-linux-gnu/librt.so -- cublas library : /usr/local/cuda/lib64/libcublas.so;/usr/local/cuda/lib64/libcublas_device.a -- cufft library : /usr/local/cuda/lib64/libcufft.so -- curand library : /usr/local/cuda/lib64/libcurand.so -- cuDNN library : /usr/local/cuda/lib64/libcudnn.so -- nvrtc : /usr/local/cuda/lib64/libnvrtc.so -- CUDA include path : /usr/local/cuda/include -- NVCC executable : /usr/local/cuda/bin/nvcc -- CUDA host compiler : /usr/bin/cc -- USE_TENSORRT : OFF -- USE_ROCM : OFF -- USE_EIGEN_FOR_BLAS : ON -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_GLOO : ON -- USE_GLOO_IBVERBS : OFF -- USE_LEVELDB : ON -- LevelDB version : 1.18 -- Snappy version : 1.1.3 -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.17 -- USE_METAL : OFF -- USE_MKL : -- USE_MOBILE_OPENGL : OFF -- USE_MPI : ON -- USE_NCCL : ON -- USE_SYSTEM_NCCL : OFF -- USE_NERVANA_GPU : OFF -- USE_NNPACK : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : ON -- OpenCV version : 2.4.9.1 -- USE_OPENMP : OFF -- USE_PROF : OFF -- USE_REDIS : OFF -- USE_ROCKSDB : OFF -- USE_ZMQ : OFF -- Configuring done -- Generating done -- Build files have been written to: /home/jtang/pytorch/build
jtang@1032:~$ find /usr -name 'libmpi*' find: ‘/usr/local/lost+found’: Permission denied find: ‘/usr/local/dataset/DOTA/test_cut’: Permission denied /usr/lib/libmpi.so /usr/lib/libmpi_usempif08.so.11.1.0 /usr/lib/openmpi/lib/libmpi.so /usr/lib/openmpi/lib/libmpi_usempif08.so.11.1.0 /usr/lib/openmpi/lib/libmpi_mpifh.so /usr/lib/openmpi/lib/libmpi_cxx.so.1.1.3 /usr/lib/openmpi/lib/libmpi_cxx.so /usr/lib/openmpi/lib/libmpi_usempi_ignore_tkr.so /usr/lib/openmpi/lib/libmpi_usempi_ignore_tkr.so.6.1.0 /usr/lib/openmpi/lib/libmpi.so.12.0.2 /usr/lib/openmpi/lib/libmpi_usempif08.so /usr/lib/openmpi/lib/libmpi_mpifh.so.12.0.0 /usr/lib/libmpi_mpifh.so /usr/lib/libmpi_cxx.so.1.1.3 /usr/lib/libmpi.so.12 /usr/lib/libmpi_cxx.so /usr/lib/libmpi++.so /usr/lib/libmpi_usempi_ignore_tkr.so.6 /usr/lib/libmpi_usempi_ignore_tkr.so /usr/lib/libmpi_usempif08.so.11 /usr/lib/libmpi_usempi_ignore_tkr.so.6.1.0 /usr/lib/libmpi.so.12.0.2 /usr/lib/libmpi_usempif08.so /usr/lib/libmpi_mpifh.so.12.0.0 /usr/lib/libmpi_mpifh.so.12 /usr/lib/libmpi_cxx.so.1
Further tracking of this in the new Pytorch repo https://github.com/pytorch/pytorch/issues/8028
Hi,
I am getting an error while running make in Caffe2. This is what it says: /usr/bin/ld: CMakeFiles/mpi_test.dir/mpi/mpi_test.cc.o: undefined reference to symbol '_ZN3MPI8Datatype4FreeEv' /usr/lib/libmpi_cxx.so.1: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status caffe2/CMakeFiles/mpi_test.dir/build.make: 100: recipe for target 'bin/mpi_test' failed make[2]: [bin/mpi_test] Error 1 CMakeFiles/Makefile2:2518: recipe for target 'caffe2/CMakeFiles/mpi_test.dir/all' failed make[1]: [caffe2/CMakeFiles/mpi_test.dir/all] Error 2 Makefile:138: recipe for target 'all' failed make: *** [all] Error 2
Any idea how I can fix it? Any help will be appreciated.
Thanks!!
System information
CMake summary output
**** Summary **** -- General: -- CMake version : 3.5.1 -- CMake command : /usr/bin/cmake -- Git version : v0.8.1-1240-g8f41717 -- System : Linux -- C++ compiler : /usr/bin/c++ -- C++ compiler version : 5.4.0 -- Protobuf compiler : /usr/bin/protoc -- Protobuf include path : /usr/include -- Protobuf libraries : optimized;/usr/lib/x86_64-linux-gnu/libprotobuf.so;debug;/usr/lib/x86_64-linux-gnu/libprotobuf.so;-pthread -- BLAS : Eigen -- CXX flags : -std=c++11 -O2 -fPIC -Wno-narrowing -Wno-invalid-partial-specialization -- Build type : Release
-- Compile definitions : -- -- BUILD_BINARY : ON -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 2.7.12 -- Python library : /usr/lib/x86_64-linux-gnu/libpython2.7.so -- BUILD_SHARED_LIBS : ON -- BUILD_TEST : ON -- USE_ATEN : OFF -- USE_ASAN : OFF -- USE_CUDA : OFF -- USE_EIGEN_FOR_BLAS : 1 -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_GLOO : ON -- USE_LEVELDB : ON -- LevelDB version : 1.18 -- Snappy version : 1.1.3 -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.17 -- USE_METAL : OFF -- USE_MKL : -- USE_MOBILE_OPENGL : OFF -- USE_MPI : ON -- USE_NCCL : OFF -- USE_NERVANA_GPU : OFF -- USE_NNPACK : ON -- USE_OBSERVERS : ON -- USE_OPENCV : ON -- OpenCV version : 2.4.9.1 -- USE_OPENMP : OFF -- USE_PROF : OFF -- USE_REDIS : OFF -- USE_ROCKSDB : OFF -- USE_THREADS : ON -- USE_ZMQ : OFF -- Configuring done -- Generating done -- Build files have been written to: /home/ubuntu/caffe2/build