apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Error when building mxnet with USE_DIST_KVSTORE #17299

Closed ChaokunChang closed 4 years ago

ChaokunChang commented 4 years ago

Description

(A clear and concise description of what the bug is.) I am building MxNet from source code, following the steps:

  1. git clone --recursive https://github.com/ChaokunChang/incubator-mxnet byteps-backend
  2. cd byteps-backend; mkdir build; cd build;
  3. cmake -DUSE_CUDA=1 -DUSE_MKLDNN=1 -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DBUILD_CYTHON_MODULES=1 -DUSE_DIST_KVSTORE=1 ..
  4. DEBUG=1 make -j48

Then I met an error like this:

Error Message

[ 1%] Built target libomp-needed-headers [5/1878] [ 2%] Built target dmlc [ 1%] Built target sample_lib [ 3%] Performing build step for 'ZMQ' [ 4%] Built target subgraph_lib make[3]: No targets specified and no makefile found. Stop. 3rdparty/ps-lite/CMakeFiles/ZMQ.dir/build.make:111: recipe for target 'external/ZMQ-prefix/src/ZMQ-stamp/ZMQ-build' failed make[2]: [external/ZMQ-prefix/src/ZMQ-stamp/ZMQ-build] Error 2 Scanning dependencies of target gtest_main CMakeFiles/Makefile2:2196: recipe for target '3rdparty/ps-lite/CMakeFiles/ZMQ.dir/all' failed make[1]: [3rdparty/ps-lite/CMakeFiles/ZMQ.dir/all] Error 2 make[1]: Waiting for unfinished jobs.... [ 9%] Built target dnnl_common [ 10%] Building CXX object 3rdparty/googletest/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o [ 14%] Built target omp In file included from /home/ubuntu/byteps-backend/build/3rdparty/ps-lite/src/meta.pb.cc:4:0: /home/ubuntu/byteps-backend/build/3rdparty/ps-lite/src/meta.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory

include <google/protobuf/port_def.inc>

      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

compilation terminated. 3rdparty/ps-lite/CMakeFiles/pslite.dir/build.make:113: recipe for target '3rdparty/ps-lite/CMakeFiles/pslite.dir/src/meta.pb.cc.o' failed make[2]: [3rdparty/ps-lite/CMakeFiles/pslite.dir/src/meta.pb.cc.o] Error 1 make[2]: Waiting for unfinished jobs.... [ 32%] Built target dnnl_cpu In file included from /home/ubuntu/byteps-backend/3rdparty/ps-lite/src/van.cc:14:0: /home/ubuntu/byteps-backend/build/3rdparty/ps-lite/src/./meta.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory

include <google/protobuf/port_def.inc>

      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

compilation terminated. 3rdparty/ps-lite/CMakeFiles/pslite.dir/build.make:100: recipe for target '3rdparty/ps-lite/CMakeFiles/pslite.dir/src/van.cc.o' failed make[2]: [3rdparty/ps-lite/CMakeFiles/pslite.dir/src/van.cc.o] Error 1 CMakeFiles/Makefile2:2159: recipe for target '3rdparty/ps-lite/CMakeFiles/pslite.dir/all' failed make[1]: [3rdparty/ps-lite/CMakeFiles/pslite.dir/all] Error 2 [ 32%] Linking CXX static library ../../../lib/libgtest_main.a [ 32%] Built target gtest_main [ 95%] Built target mxnet_static Makefile:140: recipe for target 'all' failed

image

To Reproduce

The code is from https://github.com/ChaokunChang/incubator-mxnet; branck byteps-backend. The code change doesn't include settings related to compile.

Steps to reproduce

What have you tried to solve it?

  1. I have tried with ninja (add _GNinja when cmake), but with the same error.
  2. Everything will be ok if I remove the -DUSE_DIST_KVSTORE in cmake-step.

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here
----------Python Info----------
Version      : 3.6.6
Compiler     : GCC 7.2.0
Build        : ('default', 'Jun 28 2018 17:14:51')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform     : Linux-4.15.0-1054-aws-x86_64-with-debian-buster-sid
system       : Linux
node         : ip-172-31-39-16
release      : 4.15.0-1054-aws
version      : #56-Ubuntu SMP Thu Nov 7 16:15:59 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             2724.723
CPU max MHz:         3000.0000
CPU min MHz:         1200.0000
BogoMIPS:            4600.12
Hypervisor vendor:   Xen
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0028 sec, LOAD: 0.3791 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0005 sec, LOAD: 0.3934 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0008 sec, LOAD: 0.0229 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0005 sec, LOAD: 0.0032 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0002 sec, LOAD: 0.1704 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0003 sec, LOAD: 0.0171 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0005 sec, LOAD: 0.0531 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0002 sec, LOAD: 0.0235 sec.
leezu commented 4 years ago

Please post the output of cmake -DUSE_CUDA=1 -DUSE_MKLDNN=1 -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DBUILD_CYTHON_MODULES=1 -DUSE_DIST_KVSTORE=1 ...

It seems protobuf is not found correctly on your system. If this is the case and if protobuf is required, we can change the CMakeLists.txt to error out in case of not finding protobuf.

leezu commented 4 years ago

Besides protobuf, zmq is also a required dependency. However, cmake currently treats it as optional, leading to build error

`../3rdparty/ps-lite/src/./zmq_van.h:8:10: fatal error: zmq.h: No such file or directory

include `

if -DUSE_DIST_KVSTORE=1 is used.

ChaokunChang commented 4 years ago

Please post the output of cmake -DUSE_CUDA=1 -DUSE_MKLDNN=1 -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DBUILD_CYTHON_MODULES=1 -DUSE_DIST_KVSTORE=1 ...

It seems protobuf is not found correctly on your system. If this is the case and if protobuf is required, we can change the CMakeLists.txt to error out in case of not finding protobuf.

This is the output of cmake:

-- The C compiler identification is GNU 7.4.0 -- The CXX compiler identification is GNU 7.4.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- CMAKE_CROSSCOMPILING FALSE -- CMAKE_HOST_SYSTEM_PROCESSOR x86_64 -- CMAKE_SYSTEM_PROCESSOR x86_64 -- CMAKE_SYSTEM_NAME Linux -- CMake version '3.13.3' using generator 'Unix Makefiles' -- The CUDA compiler identification is NVIDIA 10.0.130 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Performing Test SUPPORT_CXX11 -- Performing Test SUPPORT_CXX11 - Success -- Performing Test SUPPORT_CXX0X -- Performing Test SUPPORT_CXX0X - Success -- Performing Test SUPPORT_MSSE3 -- Performing Test SUPPORT_MSSE3 - Success -- Performing Test SUPPORT_MSSE2 -- Performing Test SUPPORT_MSSE2 - Success -- Determining F16C support -- Performing Test COMPILER_SUPPORT_MF16C -- Performing Test COMPILER_SUPPORT_MF16C - Success -- F16C enabled -- CMAKE_BUILD_TYPE is unset, defaulting to Release -- MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value OFF -- MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value OFF -- MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value OFF -- MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value STATIC -- MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value `` -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5")
-- GPU support is disabled -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) -- Found Git: /usr/bin/git (found version "2.17.1") -- Intel(R) VTune(TM) Amplifier JIT profiling disabled -- Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_RT_LIBRARY) -- Found OpenBLAS libraries: /usr/local/lib/libopenblas.so -- Found OpenBLAS include: /usr/local/include -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Could NOT find Jemalloc (missing: JEMALLOC_LIBRARY JEMALLOC_INCLUDE_DIR) -- Found OpenCV: /usr/local (found version "4.0.0") found components: core highgui imgproc imgcodecs -- OpenCV 4.0.0 found (/usr/local/lib/cmake/opencv4) -- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs -- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- Performing Test OPENMP_HAVE_WERROR_FLAG -- Performing Test OPENMP_HAVE_WERROR_FLAG - Success -- Performing Test OPENMP_HAVE_STD_GNUPP11_FLAG -- Performing Test OPENMP_HAVE_STD_GNUPP11_FLAG - Success -- Performing Test OPENMP_HAVE_STD_CPP11_FLAG -- Performing Test OPENMP_HAVE_STD_CPP11_FLAG - Success -- Found PythonInterp: /home/ubuntu/anaconda3/bin/python (found version "3.6.6") -- Cannot find llvm-lit. -- Please put llvm-lit in your PATH, set OPENMP_LLVM_LIT_EXECUTABLE to its full path, or point OPENMP_LLVM_TOOLS_DIR to its directory. CMake Warning at 3rdparty/openmp/cmake/OpenMPTesting.cmake:22 (message): The check targets will not be available! Call Stack (most recent call first): 3rdparty/openmp/cmake/OpenMPTesting.cmake:40 (find_standalone_test_dependencies) 3rdparty/openmp/CMakeLists.txt:49 (include)

-- Performing Test LIBOMP_HAVE_FNO_EXCEPTIONS_FLAG -- Performing Test LIBOMP_HAVE_FNO_EXCEPTIONS_FLAG - Success -- Performing Test LIBOMP_HAVE_FNO_RTTI_FLAG -- Performing Test LIBOMP_HAVE_FNO_RTTI_FLAG - Success -- Performing Test LIBOMP_HAVE_X_CPP_FLAG -- Performing Test LIBOMP_HAVE_X_CPP_FLAG - Success -- Performing Test LIBOMP_HAVE_WCAST_QUAL_FLAG -- Performing Test LIBOMP_HAVE_WCAST_QUAL_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_UNUSED_FUNCTION_FLAG -- Performing Test LIBOMP_HAVE_WNO_UNUSED_FUNCTION_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG -- Performing Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG - Failed -- Performing Test LIBOMP_HAVE_WNO_UNUSED_VALUE_FLAG -- Performing Test LIBOMP_HAVE_WNO_UNUSED_VALUE_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_UNUSED_VARIABLE_FLAG -- Performing Test LIBOMP_HAVE_WNO_UNUSED_VARIABLE_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_SWITCH_FLAG -- Performing Test LIBOMP_HAVE_WNO_SWITCH_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG -- Performing Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG - Failed -- Performing Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG -- Performing Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG - Failed -- Performing Test LIBOMP_HAVE_WNO_SIGN_COMPARE_FLAG -- Performing Test LIBOMP_HAVE_WNO_SIGN_COMPARE_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG -- Performing Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG - Failed -- Performing Test LIBOMP_HAVE_WNO_UNKNOWN_PRAGMAS_FLAG -- Performing Test LIBOMP_HAVE_WNO_UNKNOWN_PRAGMAS_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_MISSING_FIELD_INITIALIZERS_FLAG -- Performing Test LIBOMP_HAVE_WNO_MISSING_FIELD_INITIALIZERS_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_MISSING_BRACES_FLAG -- Performing Test LIBOMP_HAVE_WNO_MISSING_BRACES_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_COMMENT_FLAG -- Performing Test LIBOMP_HAVE_WNO_COMMENT_FLAG - Success -- Performing Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG -- Performing Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG - Failed -- Performing Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG -- Performing Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG - Failed -- Performing Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG -- Performing Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG - Failed -- Performing Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG -- Performing Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG - Success -- Performing Test LIBOMP_HAVE_MSSE2_FLAG -- Performing Test LIBOMP_HAVE_MSSE2_FLAG - Success -- Performing Test LIBOMP_HAVE_FTLS_MODEL_FLAG -- Performing Test LIBOMP_HAVE_FTLS_MODEL_FLAG - Success -- Performing Test LIBOMP_HAVE_MMIC_FLAG -- Performing Test LIBOMP_HAVE_MMIC_FLAG - Failed -- Performing Test LIBOMP_HAVE_M32_FLAG -- Performing Test LIBOMP_HAVE_M32_FLAG - Failed -- Performing Test LIBOMP_HAVE_X_FLAG -- Performing Test LIBOMP_HAVE_X_FLAG - Success -- Performing Test LIBOMP_HAVE_WARN_SHARED_TEXTREL_FLAG -- Performing Test LIBOMP_HAVE_WARN_SHARED_TEXTREL_FLAG - Success -- Performing Test LIBOMP_HAVE_AS_NEEDED_FLAG -- Performing Test LIBOMP_HAVE_AS_NEEDED_FLAG - Success -- Performing Test LIBOMP_HAVE_VERSION_SCRIPT_FLAG -- Performing Test LIBOMP_HAVE_VERSION_SCRIPT_FLAG - Success -- Performing Test LIBOMP_HAVE_STATIC_LIBGCC_FLAG -- Performing Test LIBOMP_HAVE_STATIC_LIBGCC_FLAG - Success -- Performing Test LIBOMP_HAVE_Z_NOEXECSTACK_FLAG -- Performing Test LIBOMP_HAVE_Z_NOEXECSTACK_FLAG - Success -- Performing Test LIBOMP_HAVE_FINI_FLAG -- Performing Test LIBOMP_HAVE_FINI_FLAG - Success -- Found Perl: /usr/bin/perl (found version "5.26.1") -- Performing Test LIBOMP_HAVE_VERSION_SYMBOLS -- Performing Test LIBOMP_HAVE_VERSION_SYMBOLS - Success -- Performing Test LIBOMP_HAVE_BUILTIN_FRAME_ADDRESS -- Performing Test LIBOMP_HAVE___BUILTIN_FRAME_ADDRESS - Success -- Performing Test LIBOMP_HAVE_WEAK_ATTRIBUTE -- Performing Test LIBOMP_HAVE_WEAK_ATTRIBUTE - Success -- Looking for include files windows.h, psapi.h -- Looking for include files windows.h, psapi.h - not found -- Looking for EnumProcessModules in psapi -- Looking for EnumProcessModules in psapi - not found -- LIBOMP: Operating System -- Linux -- LIBOMP: Target Architecture -- x86_64 -- LIBOMP: Build Type -- Release -- LIBOMP: Library Kind -- SHARED -- LIBOMP: Library Type -- normal -- LIBOMP: Fortran Modules -- FALSE -- LIBOMP: Build -- 20140926 -- LIBOMP: Use Stats-gathering -- FALSE -- LIBOMP: Use Debugger-support -- FALSE -- LIBOMP: Use ITT notify -- TRUE -- LIBOMP: Use OMPT-support -- TRUE -- LIBOMP: Use OMPT-optional -- TRUE -- LIBOMP: Use Adaptive locks -- TRUE -- LIBOMP: Use quad precision -- TRUE -- LIBOMP: Use TSAN-support -- FALSE -- LIBOMP: Use Hwloc library -- FALSE -- Looking for sqrt in m -- Looking for sqrt in m - found -- Looking for atomic_load_1 -- Looking for atomic_load_1 - not found -- Looking for atomic_load_1 in atomic -- Looking for __atomic_load_1 in atomic - found -- check-libomp does nothing. -- check-ompt does nothing. -- check-openmp does nothing. USE_LAPACK is ON -- Could NOT find Jemalloc (missing: JEMALLOC_LIBRARY JEMALLOC_INCLUDE_DIR) CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project): VERSION keyword not followed by a value or was followed by a value that expanded to nothing.

-- Found GTest: gtest
-- Found CUDNN: /usr/local/cuda/lib64/libcudnn.so
-- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for fopen64 -- Looking for fopen64 - not found -- Looking for C++ include cxxabi.h -- Looking for C++ include cxxabi.h - found -- Looking for nanosleep -- Looking for nanosleep - found -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include
-- Check if the system is big endian -- Searching 16 bit integer -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of unsigned short -- Check size of unsigned short - done -- Using unsigned short -- Check if the system is big endian - little endian -- /home/ubuntu/incubator-mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h -- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.0;3.5;5.0;5.2;6.0;6.1;7.0;7.0+PTX;7.5;7.5+PTX -- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75 -- Found CUDAToolkit: /usr/local/cuda/include (found version "10.0.130") -- Could NOT find ZMQ (missing: ZMQ_LIBRARY ZMQ_INCLUDE_DIR) CMake Warning at /usr/local/lib/python3.6/dist-packages/cmake/data/share/cmake-3.13/Modules/FindProtobuf.cmake:495 (message): Protobuf compiler version 3.8.0 doesn't match library version 3.0.0 Call Stack (most recent call first): 3rdparty/ps-lite/cmake/ProtoBuf.cmake:4 (find_package) 3rdparty/ps-lite/CMakeLists.txt:22 (include)

-- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread (found version "3.0.0") -- Found PROTOBUF Compiler: /home/ubuntu/anaconda3/bin/protoc -- Found PythonInterp: /usr/bin/python2 (found suitable exact version "2.7.15") -- Cython modules for python2 will be built -- Found PythonInterp: /home/ubuntu/anaconda3/bin/python3 (found suitable exact version "3.6.6") -- Cython modules for python3 will be built -- Configuring done -- Generating done -- Build files have been written to: /home/ubuntu/incubator-mxnet/build

leezu commented 4 years ago

Looking at your output, you can see

 -- Could NOT find ZMQ (missing: ZMQ_LIBRARY ZMQ_INCLUDE_DIR)

You need to install ZMQ. This should have been a fatal error preventing build.

CMake Warning at /usr/local/lib/python3.6/dist-packages/cmake/data/share/cmake-3.13/Modules/FindProtobuf.cmake:495 (message):
Protobuf compiler version 3.8.0 doesn't match library version 3.0.0
Call Stack (most recent call first):
3rdparty/ps-lite/cmake/ProtoBuf.cmake:4 (find_package)
3rdparty/ps-lite/CMakeLists.txt:22 (include)

-- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread (found version "3.0.0")
-- Found PROTOBUF Compiler: /home/ubuntu/anaconda3/bin/protoc

Your system is not setup correctly. You're mixing system protobuf and some conda protobuf. Conda messes a lot with your system. You shouldn't use it if you want to compile MXNet.

ChaokunChang commented 4 years ago

Looking at your output, you can see

 -- Could NOT find ZMQ (missing: ZMQ_LIBRARY ZMQ_INCLUDE_DIR)

You need to install ZMQ. This should have been a fatal error preventing build.

CMake Warning at /usr/local/lib/python3.6/dist-packages/cmake/data/share/cmake-3.13/Modules/FindProtobuf.cmake:495 (message):
Protobuf compiler version 3.8.0 doesn't match library version 3.0.0
Call Stack (most recent call first):
3rdparty/ps-lite/cmake/ProtoBuf.cmake:4 (find_package)
3rdparty/ps-lite/CMakeLists.txt:22 (include)

-- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread (found version "3.0.0")
-- Found PROTOBUF Compiler: /home/ubuntu/anaconda3/bin/protoc

Your system is not setup correctly. You're mixing system protobuf and some conda protobuf. Conda messes a lot with your system. You shouldn't use it if you want to compile MXNet.

Thank you a lot. Do you have any suggestions to disable Conda protobuf without uninstalling? (I didn't use conda meaningly.)

leezu commented 4 years ago

Thank you a lot. Do you have any suggestions to disable Conda protobuf without uninstalling? (I didn't use conda meaningly.)

I suppose you activated conda after logging in to your machine. Something like source activate mxnet_p36. If you did so, try not doing so.

ChaokunChang commented 4 years ago

so

I have never used source activate mxnet_p3 command before (In fact I even didn't know what is it ...). I thought I didn't use conda because there is no something like (base) before ubuntu@ip-xx-xx-xx-xx:~$

leezu commented 4 years ago

It means you are using the (base) conda environment.

Maybe it's activated by default on the Deep Learning AMI. You can check the .bashrc file and delete any lines like source activate.

ChaokunChang commented 4 years ago

close in mistake, reopen

leezu commented 4 years ago

@ChaokunChang Sorry, I misread your comment at https://github.com/apache/incubator-mxnet/issues/17299#issuecomment-575147222

I understand you're not using conda. Unfortunately the environment you're using (DLAMI) activates some conda features by default (for example you're using /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip and /home/ubuntu/anaconda3/bin/protoc). This is a problem with DLAMI.

I recommend you use Base-DLAMI instead. We'll report this problem to DLAMI team.

apeforest commented 4 years ago

@leezu I encountered the same problem trying to install mxnet with USE_DIST_KVSTORE ON in the latest AWS DLAMI. I think we need to solve this problem for a better user experience.

leezu commented 4 years ago

@apeforest may be able to add a workaround for DLAMI at https://github.com/dmlc/ps-lite/blob/master/cmake/ProtoBuf.cmake

I think the problem here is that DLAMI has a strange setup including /home/ubuntu/anaconda3/bin/protoc by default on PATH. What do you think?

apeforest commented 4 years ago

@leezu Just FYI, if I use make with the USE_DIST_KVSTORE=ON in config.mk, then the build is fine.

leezu commented 4 years ago

@apeforest thank you. cmake build fails due to DLAMI violating some assumption taken when writing https://github.com/dmlc/ps-lite/blob/master/cmake/ProtoBuf.cmake in 2017 (cc @yajiedesign). I'll look into finding a workaround it in a few days

leezu commented 4 years ago

I verified this error is due to DLAMI shipping protoc version 3.8.0 but headers from version 3.0.0.

google/protobuf/port_def.inc is included in headers generated by protoc version >= 3.7.0, but of course if your system comes only with headers from 3.0.0 compilation will fail.

leezu commented 4 years ago

You can workaround the DLAMI bugs by running with

cmake -DProtobuf_PROTOC_EXECUTABLE=/usr/bin/protoc [...]

leezu commented 4 years ago

Reported an issue upstream requesting if cmake may have any strategy to handle broken systems such as DLAMI https://gitlab.kitware.com/cmake/cmake/issues/20403

leezu commented 4 years ago

With respect to the Makefile build @apeforest, the reason is that Makefile based build doesn't use system protobuf, but downloads and compiles protobuf-cpp-3.5.1.tar.gz as part of the build.

leezu commented 4 years ago

https://github.com/dmlc/ps-lite/pull/170