Error while building Pytorch

Jabbo16 commented 5 years ago

Hi, Im getting the following error while building Pytorch:

[ 93%] Linking CXX executable ../bin/cuda_half_test
[ 93%] Built target cuda_half_test
[ 93%] Building CXX object caffe2/torch/lib/c10d/test/CMakeFiles/TCPStoreTest.dir/TCPStoreTest.cpp.o
[ 93%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/generated/VariableType_1.cpp.o
[ 93%] Linking CXX executable ../bin/cuda_optional_test
[ 93%] Built target cuda_optional_test
[ 93%] Building CXX object caffe2/torch/lib/c10d/test/CMakeFiles/FileStoreTest.dir/FileStoreTest.cpp.o
[ 93%] Linking CXX executable ../../../../../bin/FileStoreTest
/usr/bin/ld: can't find -l__caffe2_nccl
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/torch/lib/c10d/test/CMakeFiles/FileStoreTest.dir/build.make:106: bin/FileStoreTest] Error 1
make[1]: *** [CMakeFiles/Makefile2:9552: caffe2/torch/lib/c10d/test/CMakeFiles/FileStoreTest.dir/all] Error 2
make[1]: *** Se espera a que terminen otras tareas....
[ 93%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/generated/VariableType_2.cpp.o
[ 93%] Linking CXX executable ../bin/cuda_packedtensoraccessor_test
[ 93%] Linking CXX executable ../../../../../bin/TCPStoreTest
/usr/bin/ld: can't find -l__caffe2_nccl
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/torch/lib/c10d/test/CMakeFiles/TCPStoreTest.dir/build.make:106: bin/TCPStoreTest] Error 1
make[1]: *** [CMakeFiles/Makefile2:9416: caffe2/torch/lib/c10d/test/CMakeFiles/TCPStoreTest.dir/all] Error 2
[ 93%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/generated/VariableType_3.cpp.o
[ 93%] Built target cuda_packedtensoraccessor_test
[ 93%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/generated/VariableType_4.cpp.o
[ 93%] Linking CXX shared library ../../../../../lib/libc10d_cuda_test.so
/usr/bin/ld: can't find  -l__caffe2_nccl
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/torch/lib/c10d/test/CMakeFiles/c10d_cuda_test.dir/build.make:429: lib/libc10d_cuda_test.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:9461: caffe2/torch/lib/c10d/test/CMakeFiles/c10d_cuda_test.dir/all] Error 2
[ 93%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/grad_mode.cpp.o
[ 94%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/input_buffer.cpp.o
[ 94%] Building CXX object caffe2/torch/CMakeFiles/torch.dir/csrc/autograd/profiler.cpp.o

Extract from the startup log:

(base) jabbo@jabbo-pc:~/TorchCraftAI/3rdparty/pytorch/tools$ REL_WITH_DEB_INFO=1 python build_libtorch.py
-- std::exception_ptr is supported.
-- NUMA is available
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:/home/jabbo/TorchCraftAI/3rdparty/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- The BLAS backend of choice:MKL
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: /home/jabbo/anaconda3/lib/libmkl_intel_lp64.so
--   Library mkl_gnu_thread: /home/jabbo/anaconda3/lib/libmkl_gnu_thread.so
--   Library mkl_core: /home/jabbo/anaconda3/lib/libmkl_core.so
-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
--   Library gomp: -fopenmp
--   Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so
--   Library m: /usr/lib/x86_64-linux-gnu/libm.so
--   Library dl: /usr/lib/x86_64-linux-gnu/libdl.so
-- Brace yourself, we are building NNPACK
-- Failed to find LLVM FileCheck
-- git Version: v1.4.0-505be96a
-- Version: 1.4.0
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
CMake Warning (dev) at tools/third_party/fbgemm/third_party/asmjit/CMakeLists.txt:34 (set):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at tools/third_party/fbgemm/third_party/asmjit/CMakeLists.txt:35 (set):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at tools/third_party/fbgemm/third_party/asmjit/CMakeLists.txt:36 (set):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at tools/third_party/fbgemm/third_party/asmjit/CMakeLists.txt:37 (set):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at tools/third_party/fbgemm/third_party/asmjit/CMakeLists.txt:38 (set):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- [asmjit]
   BuildMode=Static
   BuildTest=Off
   ASMJIT_DIR=/home/jabbo/TorchCraftAI/3rdparty/pytorch/tools/third_party/fbgemm/third_party/asmjit
   ASMJIT_DEPS=pthread;rt
   ASMJIT_LIBS=asmjit;pthread;rt
   ASMJIT_CFLAGS=-DASMJIT_STATIC
   ASMJIT_SOURCE_DIR=/home/jabbo/TorchCraftAI/3rdparty/pytorch/tools/third_party/fbgemm/third_party/asmjit/src
   ASMJIT_INCLUDE_DIR=/home/jabbo/TorchCraftAI/3rdparty/pytorch/tools/third_party/fbgemm/third_party/asmjit/src
   ASMJIT_PRIVATE_CFLAGS=
     -DASMJIT_STATIC
     -std=c++17
     -fno-tree-vectorize
     -fvisibility=hidden
     -O2 [RELEASE]
     -fno-keep-static-consts [RELEASE]
     -fmerge-all-constants [RELEASE]
-- Found Numa  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnuma.so)
-- Using third party subdirectory Eigen.
-- Could NOT find pybind11 (missing: pybind11_DIR)
-- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR) 
-- Using third_party/pybind11.
-- Caffe2: CUDA detected: 9.2
-- Caffe2: CUDA nvcc is: /usr/lib/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/lib/cuda
-- Caffe2: Header version is: 9.2
-- Found cuDNN: v7.4.1  (include: /usr/lib/cuda/include, library: /usr/lib/cuda/lib64/libcudnn.so.7)
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.0;3.5;5.0;5.2;6.0;6.1;7.0;7.0+PTX
-- Added CUDA NVCC flags for: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70
-- Could NOT find CUB (missing: CUB_INCLUDE_DIR) 
-- Found CUDA: /usr/lib/cuda (found suitable version "9.2", minimum required is "7.0") 
-- CUDA detected: 9.2
-- 
-- ******** Summary ********
--   CMake version         : 3.14.0
--   CMake command         : /home/jabbo/anaconda3/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 7.4.0
--   CXX flags             :  -fvisibility-inlines-hidden -Wnon-virtual-dtor
--   Build type            : RelWithDebInfo
--   Compile definitions   : TH_BLAS_MKL
--   CMAKE_PREFIX_PATH     : /home/jabbo/anaconda3/bin/../
--   CMAKE_INSTALL_PREFIX  : /home/jabbo/TorchCraftAI/3rdparty/pytorch/torch/lib/tmp_install
--   CMAKE_MODULE_PATH     : /home/jabbo/TorchCraftAI/3rdparty/pytorch/cmake/Modules;/home/jabbo/TorchCraftAI/3rdparty/pytorch/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version          : 1.3.0
--   ONNX NAMESPACE        : onnx_torch
--   ONNX_BUILD_TESTS      : OFF
--   ONNX_BUILD_BENCHMARKS : OFF
--   ONNX_USE_LITE_PROTO   : OFF
--   ONNXIFI_DUMMY_BACKEND : OFF
-- 
--   Protobuf compiler     : 
--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
-- Removing -DNDEBUG from compile flags
-- Compiling with OpenMP support
-- Compiling with MAGMA support
-- MAGMA INCLUDE DIRECTORIES: /home/jabbo/anaconda3/include
-- MAGMA LIBRARIES: /home/jabbo/anaconda3/lib/libmagma.a
-- MAGMA V2 check: 1
-- Could not find hardware support for NEON on this machine.
-- No OMAP3 processor on this machine.
-- No OMAP4 processor on this machine.
-- AVX compiler support found
-- AVX2 compiler support found
-- Atomics: using C11 intrinsics
-- Found a library with BLAS API (mkl).
-- Found a library with LAPACK API. (mkl)
-- Found CUDA: /usr/lib/cuda (found suitable version "9.2", minimum required is "5.5") 
disabling ROCM because NOT USE_ROCM is set
-- MIOpen not found. Compiling without MIOpen support
-- OpenMP lib: provided by compiler
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- VTune profiling environment is unset
-- Found MKL-DNN: TRUE
-- GCC 7.4.0: Adding gcc and gcc_s libs to link line
-- NUMA paths:
-- /usr/include
-- /usr/lib/x86_64-linux-gnu/libnuma.so
-- Using python found in /home/jabbo/anaconda3/bin/python
-- Configuring build for SLEEF-v3.2
   Target system: Linux-5.0.0-21-generic
   Target processor: x86_64
   Host system: Linux-5.0.0-21-generic
   Host processor: x86_64
   Detected C compiler: GNU @ /usr/bin/cc
-- Using option `-Wall -Wno-unused -Wno-attributes -Wno-unused-result -Wno-psabi -ffp-contract=off -fno-math-errno -fno-trapping-math` to compile libsleef
-- Building shared libs : OFF
-- MPFR : /home/jabbo/anaconda3/lib/libmpfr.so
-- MPFR header file in /home/jabbo/anaconda3/include
-- GMP : /home/jabbo/anaconda3/lib/libgmp.so
-- RUNNING_ON_TRAVIS : 0
-- COMPILER_SUPPORTS_OPENMP : 1
-- Using python found in /home/jabbo/anaconda3/bin/python
-- /usr/bin/c++ /home/jabbo/TorchCraftAI/3rdparty/pytorch/torch/abi-check.cpp -o /home/jabbo/TorchCraftAI/3rdparty/pytorch/tools/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) 
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) 
-- Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND) 
-- Found CUDA: /usr/lib/cuda (found suitable version "9.2", minimum required is "7.5") 
-- Building the gloo backend with TCP support only
-- Found CUDA: /usr/lib/cuda (found version "9.2") 
-- Building C10D with CUDA support
-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) 
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) 
-- Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND) 
-- Not able to find MPI, will compile c10d without MPI support
-- NCCL operators skipped due to no CUDA support
-- Including IDEEP operators
-- Excluding image processing operators due to no opencv
-- Excluding video processing operators due to no opencv
-- MPI operators skipped due to no MPI support
CMake Warning at CMakeLists.txt:391 (message):
  Generated cmake files are only fully tested if one builds with system glog,
  gflags, and protobuf.  Other settings may generate files that are not well
  tested.

-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.14.0
--   CMake command         : /home/jabbo/anaconda3/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 7.4.0
--   BLAS                  : MKL
--   CXX flags             :  -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -Wno-stringop-overflow
--   Build type            : RelWithDebInfo
--   Compile definitions   : TH_BLAS_MKL;ONNX_NAMESPACE=onnx_torch;MAGMA_V2;USE_C11_ATOMICS=1;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1
--   CMAKE_PREFIX_PATH     : /home/jabbo/anaconda3/bin/../
--   CMAKE_INSTALL_PREFIX  : /home/jabbo/TorchCraftAI/3rdparty/pytorch/torch/lib/tmp_install
-- 
--   TORCH_VERSION         : 1.0.0
--   CAFFE2_VERSION        : 1.0.0
--   BUILD_ATEN_MOBILE     : OFF
--   BUILD_ATEN_ONLY       : OFF
--   BUILD_BINARY          : 
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : 
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_SHARED_LIBS     : ON
--   BUILD_TEST            : ON
--   USE_ASAN              : OFF
--   USE_CUDA              : 1
--     CUDA static link    : 0
--     USE_CUDNN           : 0
--     CUDA version        : 9.2
--     CUDA root directory : /usr/lib/cuda
--     CUDA library        : /usr/lib/x86_64-linux-gnu/libcuda.so
--     cudart library      : /usr/lib/cuda/lib64/libcudart_static.a;-pthread;dl;/usr/lib/x86_64-linux-gnu/librt.so
--     cublas library      : /usr/lib/cuda/lib64/libcublas.so
--     cufft library       : /usr/lib/cuda/lib64/libcufft.so
--     curand library      : /usr/lib/cuda/lib64/libcurand.so
--     nvrtc               : /usr/lib/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/lib/cuda/include
--     NVCC executable     : /usr/lib/cuda/bin/nvcc
--     CUDA host compiler  : /usr/bin/cc
--     USE_TENSORRT        : OFF
--   USE_ROCM              : 0
--   USE_EIGEN_FOR_BLAS    : 
--   USE_FBGEMM            : ON
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_MKL               : ON
--   USE_MKLDNN            : ON
--   USE_NCCL              : OFF
--   USE_NNPACK            : 1
--   USE_NUMPY             : 
--   USE_OBSERVERS         : OFF
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : 1
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : 1
--     USE_MPI             : OFF
--     USE_GLOO            : ON
--     USE_GLOO_IBVERBS    : OFF
--   Public Dependencies  : Threads::Threads;caffe2::mkl;caffe2::mkldnn
--   Private Dependencies : qnnpack;nnpack;cpuinfo;fbgemm;/usr/lib/x86_64-linux-gnu/libnuma.so;fp16;gloo;aten_op_header_gen;onnxifi_loader;rt;gcc_s;gcc;dl
-- Configuring done
CMake Warning (dev) at cmake/Dependencies.cmake:822 (add_dependencies):
  Policy CMP0046 is not set: Error on non-existent dependency in
  add_dependencies.  Run "cmake --help-policy CMP0046" for policy details.
  Use the cmake_policy command to set the policy and suppress this warning.

  The dependency target "nccl_external" of target "gloo_cuda" does not exist.
Call Stack (most recent call first):
  CMakeLists.txt:199 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /home/jabbo/TorchCraftAI/3rdparty/pytorch/tools

I also have installed the nccl CUDA 9.2 packages libnccl-dev and libnccl2 as well as the cuDNN one

dgant commented 5 years ago

I saw from https://github.com/facebookarchive/caffe2/issues/2097 that this error -- can't find -l__caffe2_nccl -- pops up when using an older CMake, and that it works correctly with 3.10 (if that's the case, we should update the minimum required version).

What version are you on?

dgant commented 5 years ago

You may also want to ask on the PyTorch repository.

Jabbo16 commented 5 years ago

I saw from facebookarchive/caffe2#2097 that this error -- can't find -l__caffe2_nccl -- pops up when using an older CMake, and that it works correctly with 3.10 (if that's the case, we should update the minimum required version).

What version are you on?

Im on 3.13.4 and conda uses cmake 3.14.0, I will try downgrading

dgant commented 5 years ago

If the issue was previously due to an older cmake I'd guess against downgrading being a solution. I recommend asking on the PyTorch repository.

Jabbo16 commented 5 years ago

My idea was to downgrade to the known working version(3.10) as maybe my cmake is too new and It could be the cause. I will also try a few different things and If I cannot get It to work then I will ask directly on the Pytorch repo.

Thanks! :purple_heart:

Jabbo16 commented 5 years ago

I think I managed to fix this by just reinstalling the nvidia drivers and re-cloning the repo. Weird errors like always :sweat_smile:

After that I could compile It without any problems.

also btw while trying to build TorchCraftAI (last step of the guide) It complained about missing doxygen, hiredis and glog (this one is warning), but I could fix It by manually downloading and building them from source.

I will close the issue once I confirm that It works fine and I can run a game with the default environment and CPi.

Jabbo16 commented 5 years ago

Managed to compile everything and I was able to run a game between CPi and Hao Pan using sc-docker, everything works fine now. Thanks!

BryanSWeber commented 5 years ago

The specific version of hiredis that worked for me with this problem was:

libhiredis-dev

TorchCraft / TorchCraftAI

Error while building Pytorch #19