Open lgg opened 3 years ago
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.
Can you please try if the behavior is reproducible on the latest v0.x
and master
branches?
@leezu i successfully compiled from master
branch, but it have mxnet==2.0.0
I will try to run it with v0.x
branch
@leezu hmm, i can't found v0.x
;(
user@ml-dev:~/mxnet1.8/mxnet1.8$ git checkout v0.x
error: pathspec 'v0.x' did not match any file(s) known to git
user@ml-dev:~/mxnet1.8/mxnet1.8$ git checkout 0.x
error: pathspec '0.x' did not match any file(s) known to git
user@ml-dev:~/mxnet1.8/mxnet1.8$ git checkout 0.x
error: pathspec '0.x' did not match any file(s) known to git
user@ml-dev:~/mxnet1.8/mxnet1.8$ git checkout v0.x
error: pathspec 'v0.x' did not match any file(s) known to git
I also tried: v1.x
branch - the same issue with infinity loop.
From master
branch it goes fine:
$ cmake ..
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.16.3' using generator 'Unix Makefiles'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-11.2/bin/nvcc
-- The CUDA compiler identification is NVIDIA 11.2.142
-- Check for working CUDA compiler: /usr/local/cuda-11.2/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-11.2/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF`
-- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF`
-- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF`
-- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC`
-- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value ``
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found Git: /usr/bin/git (found version "2.25.1")
-- Primitive cache is enabled
-- Using intgemm
-- Compiling with OpenMP
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- Found OpenBLAS libraries: /usr/lib/x86_64-linux-gnu/libopenblas.a
-- Found OpenBLAS include: /usr/include/x86_64-linux-gnu
Openblas uses GFortran, automatically linking to it
-- The Fortran compiler identification is GNU 9.3.0
-- Check for working Fortran compiler: /usr/bin/f95
-- Check for working Fortran compiler: /usr/bin/f95 -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /usr/bin/f95 supports Fortran 90
-- Checking whether /usr/bin/f95 supports Fortran 90 -- yes
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/mxnet1.8/mxnet1.8/build/temp
FORTRAN_DIR is /usr/lib/gcc/x86_64-linux-gnu/9;/usr/lib/x86_64-linux-gnu;/usr/lib;/lib/x86_64-linux-gnu;/lib
FORTRAN_LIB is /usr/lib/gcc/x86_64-linux-gnu/9/libgfortran.so
-- Looking for OPENBLAS_USE64BITINT
-- Looking for OPENBLAS_USE64BITINT - not found
Using LP64 OpenBLAS
After choosing blas, linking to dnnl;/usr/lib/x86_64-linux-gnu/libopenblas.a;/usr/lib/gcc/x86_64-linux-gnu/9/libgfortran.so
-- Found OpenCV: /usr (found version "4.2.0") found components: core highgui imgproc imgcodecs
-- OpenCV 4.2.0 found (/usr/lib/x86_64-linux-gnu/cmake/opencv4)
-- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
VERSION keyword not followed by a value or was followed by a value that
expanded to nothing.
-- Found PythonInterp: /usr/bin/python (found version "2.7.18")
-- Found GTest: gtest
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE)
-- Could NOT find CUTENSOR (missing: CUTENSOR_LIBRARY CUTENSOR_INCLUDE)
CMake Warning (dev) at 3rdparty/dmlc-core/cmake/Utils.cmake:196 (option):
Policy CMP0077 is not set: option() honors normal variables. Run "cmake
--help-policy CMP0077" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
For compatibility with older versions of CMake, option is clearing the
normal variable 'DMLC_FORCE_SHARED_CRT'.
Call Stack (most recent call first):
3rdparty/dmlc-core/CMakeLists.txt:23 (dmlccore_option)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- /home/user/mxnet1.8/mxnet1.8/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Success
-- Autodetected CUDA architecture(s): 6.1 6.1
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_61,code=sm_61
-- Found CUDAToolkit: /usr/local/cuda-11.2/include (found version "11.2.142")
-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES)
CMake Warning at CMakeLists.txt:614 (message):
Could not find NCCL libraries
-- Performing Test SUPPORT_MSSE3
-- Performing Test SUPPORT_MSSE3 - Success
-- Determining F16C support
-- Performing Test COMPILER_SUPPORT_MF16C
-- Performing Test COMPILER_SUPPORT_MF16C - Success
-- Using 64-bit integer for tensor size
-- Found Python3: /usr/bin/python3.8 (found version "3.8.5") found components: Interpreter
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/mxnet1.8/mxnet1.8/build
Note: on every try with building from different git branches/tags - i cleared build
folder with rm -rf build
I guess I found solution:
Manually set this in config.cmake
set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc" CACHE BOOL "Cuda compiler (nvcc)")
helped and provide this output:
$ cmake ..
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.16.3' using generator 'Unix Makefiles'
CMake Warning at CMakeLists.txt:109 (message):
CMAKE_CUDA_COMPILER guessed: /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 11.2.142
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Performing Test SUPPORT_CXX0X
-- Performing Test SUPPORT_CXX0X - Success
-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF`
-- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF`
-- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF`
-- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC`
-- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value ``
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found Git: /usr/bin/git (found version "2.25.1")
-- Primitive cache is enabled
-- Using intgemm
-- Compiling with OpenMP
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- Found OpenBLAS libraries: /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/x86_64-linux-gnu
-- Found OpenCV: /usr (found version "4.2.0") found components: core highgui imgproc imgcodecs
-- OpenCV 4.2.0 found (/usr/lib/x86_64-linux-gnu/cmake/opencv4)
-- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
VERSION keyword not followed by a value or was followed by a value that
expanded to nothing.
-- Found PythonInterp: /usr/bin/python (found version "2.7.18")
-- Found GTest: gtest
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY CUDNN_INCLUDE)
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- /home/user/mxnet1.8/mxnet1.8/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Success
-- Autodetected CUDA architecture(s): 6.1 6.1
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_61,code=sm_61
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.2.142")
-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIRS NCCL_LIBRARIES)
CMake Warning at CMakeLists.txt:668 (message):
Could not find NCCL libraries
-- Performing Test SUPPORT_MSSE3
-- Performing Test SUPPORT_MSSE3 - Success
-- Determining F16C support
-- Performing Test COMPILER_SUPPORT_MF16C
-- Performing Test COMPILER_SUPPORT_MF16C - Success
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/mxnet1.8/mxnet1.8/build
but if i change this path to:
set(CMAKE_CUDA_COMPILER "/usr/local/cuda-11.2/bin/nvcc" CACHE BOOL "Cuda compiler (nvcc)")
it still goes to infinity loop
@lgg Thanks for sharing tip. You solution looks also working with CUDA 11.4 in Arm processor.
I am just curious how you find the solution from the symptom?
Description
I tried to build mxnet from source from tag
1.8.0
and from branchv1.8.x
for cuda 11 support.My steps to reproduce:
git clone --recursive https://github.com/apache/incubator-mxnet mxnet1.8
cd mxnet/
git checkout v1.8.x
andgit checkout 1.8.0
cp config/linux_gpu.cmake config.cmake
also add content from distribution file (see my config.cmake below)mkdir build; cd build
cmake ..
(see output below for details) and it stacks in infinity loopcmake loop output
nvcc paths:
In output above all paths for nvcc are valid
I checked this paths:
What have you tried to solve it?
Environment
We recommend using our script for collecting the diagnostic information with the following command
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3
Environment Information
More env info from me:
More Environment Information
Config.cmake
My config.cmake