apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

mxnet installation from source: C++ linkage error on HPC #8874

Open jerrin92 opened 6 years ago

jerrin92 commented 6 years ago

Tried building mxnet from source on our HPC machine. However, the make fails with the error message,

/opt/gcc/5.3.0/snos/include/g++/complex:1873:3: error: template with C linkage
   template<typename _Tp, typename _Up>
   ^
/opt/gcc/5.3.0/snos/include/g++/complex:1884:3: error: template with C linkage
   template<typename _Tp> std::complex<_Tp> proj(const std::complex<_Tp>&);
   ^
/opt/gcc/5.3.0/snos/include/g++/complex:1886:3: error: template with C linkage
   template<typename _Tp>
   ^
/opt/gcc/5.3.0/snos/include/g++/complex: In function '__complex__ double std::__complex_proj(__complex__ double)':
/opt/gcc/5.3.0/snos/include/g++/complex:1903:40: error: conflicting declaration of C function '__complex__ double std::__complex_proj(__complex__ double)'
   __complex_proj(__complex__ double __z)
                                        ^
/opt/gcc/5.3.0/snos/include/g++/complex:1899:3: note: previous declaration '__complex__ float std::__complex_proj(__complex__ float)'
   __complex_proj(__complex__ float __z)
   ^

Environment info (Required)

----------Python Info----------
('Version      :', '2.7.8')
('Compiler     :', 'GCC 4.9.1 20140716 (Cray Inc.)')
('Build        :', ('default', 'Dec  8 2014 16:10:59'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '9.0.1')
('Directory    :', '/N/soft/cle4/python/2.7.8/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform     :', 'Linux-3.0.101-0.47.106.8-default-x86_64-with-SuSE-11-x86_64')
('system       :', 'Linux')
('node         :', 'login2')
('release      :', '3.0.101-0.47.106.8-default')
('version      :', '#1 SMP Wed Oct 11 18:58:12 UTC 2017 (4355936)')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             4
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Stepping:              2
CPU MHz:               2600.290
BogoMIPS:              5200.86
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0-31

We have a Cuda 7.5 and Cudnnv5 for the cuda and the gcc version is 5.3.0

thirdwing commented 6 years ago

Can you show the config.mk used? Are you using MKL?

jerrin92 commented 6 years ago

Copied below is the config.mk


#-------------------------------------------------------------------------------
#  Template configuration for compiling mxnet
#
#  If you want to change the configuration, please use the following
#  steps. Assume you are on the root directory of mxnet. First copy the this
#  file so that any local changes will be ignored by git
#
#  $ cp make/config.mk .
#
#  Next modify the according entries, and then compile by
#
#  $ make
#
#  or build in parallel with 8 threads
#
#  $ make -j8
#-------------------------------------------------------------------------------

#---------------------
# choice of compiler
#--------------------

export CC = gcc
export CXX = g++
export NVCC = nvcc

# whether compile with options for MXNet developer
DEV = 0

# whether compile with debug
DEBUG = 0

# whether compiler with profiler
USE_PROFILER =

# the additional link flags you want to add
ADD_LDFLAGS =

# the additional compile flags you want to add
ADD_CFLAGS =

#---------------------------------------------
# matrix computation libraries for CPU/GPU
#---------------------------------------------

# whether use CUDA during compile
USE_CUDA = 0

# add the path to CUDA library to link and compile flag
# if you have already add them to environment variable, leave it as NONE
# USE_CUDA_PATH = /usr/local/cuda
USE_CUDA_PATH = NONE

# whether use CuDNN R3 library
USE_CUDNN = 0

# whether use cuda runtime compiling for writing kernels in native language (i.e. Python)
USE_NVRTC = 0

# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1

# use openmp for parallelization
USE_OPENMP = 1

# MKL ML Library for Intel CPU/Xeon Phi
# Please refer to MKL_README.md for details

# MKL ML Library folder, need to be root for /usr/local
# Change to User Home directory for standard user
# For USE_BLAS!=mkl only
MKLML_ROOT=/usr/local

# whether use MKL2017 library
USE_MKL2017 = 0

# whether use MKL2017 experimental feature for high performance
# Prerequisite USE_MKL2017=1
USE_MKL2017_EXPERIMENTAL = 0

# whether use NNPACK library
USE_NNPACK = 0

# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
# in default use atlas for linux while apple for osx
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S), Darwin)
USE_BLAS = apple
else
USE_BLAS = atlas
endif

# whether use lapack during compilation
# only effective when compiled with blas versions openblas/apple/atlas/mkl
USE_LAPACK = 1

# path to lapack library in case of a non-standard installation
USE_LAPACK_PATH =

# add path to intel library, you may need it for MKL, if you did not add the path
# to environment variable
USE_INTEL_PATH = NONE

# If use MKL only for BLAS, choose static link automatically to allow python wrapper
ifeq ($(USE_MKL2017), 0)
ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
endif
else
USE_STATIC_MKL = NONE
endif

#----------------------------
# Settings for power and arm arch
#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
    USE_SSE=0
else
    USE_SSE=1
endif

#----------------------------
# distributed computing
#----------------------------

# whether or not to enable multi-machine supporting
USE_DIST_KVSTORE = 0

# whether or not allow to read and write HDFS directly. If yes, then hadoop is
# required
USE_HDFS = 0

# path to libjvm.so. required if USE_HDFS=1
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server

# whether or not allow to read and write AWS S3 directly. If yes, then
# libcurl4-openssl-dev is required, it can be installed on Ubuntu by
# sudo apt-get install -y libcurl4-openssl-dev
USE_S3 = 0

#----------------------------
# additional operators
#----------------------------

# path to folders containing projects specific operators that you don't want to put in src/operators
EXTRA_OPERATORS =

#----------------------------
# other features
#----------------------------

# Create C++ interface package
USE_CPP_PACKAGE = 0

#----------------------------
# plugins
#----------------------------

# whether to use caffe integration. This requires installing caffe.
# You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH
# CAFFE_PATH = $(HOME)/caffe
# MXNET_PLUGINS += plugin/caffe/caffe.mk

# whether to use torch integration. This requires installing torch.
# You also need to add TORCH_PATH/install/lib to your LD_LIBRARY_PATH
# TORCH_PATH = $(HOME)/torch
# MXNET_PLUGINS += plugin/torch/torch.mk

# WARPCTC_PATH = $(HOME)/warp-ctc
# MXNET_PLUGINS += plugin/warpctc/warpctc.mk

# whether to use sframe integration. This requires build sframe
# git@github.com:dato-code/SFrame.git
# SFRAME_PATH = $(HOME)/SFrame
# MXNET_PLUGINS += plugin/sframe/plugin.mk

I believe, we are not using MKL

larroy commented 6 years ago

more compiler error context is required, which .cc file is failing to build? please paste more output.

jerrin92 commented 6 years ago

The error seems to be from count_sketch.cc. You can view the error at https://pastebin.com/ept80t7f

larroy commented 6 years ago

I think it's related to your STL / G++ version.

jerrin92 commented 6 years ago

But as per the installation docs, we just need a gcc version that is above 4.8 and we have 4.9.

larroy commented 6 years ago

Difficult to diagnose without access to your platform, looks like the template is inside an "extern C" block. Check the include from the error message to see what's going on.

Try to copy the line that produces the error and add "-E" then open the output file to see what the preprocessed code looks like, then you can check if the template is surrounded by extern C, and confirm that the problem is in your STL version. Alternatively you can use a different STL version.

jerrin92 commented 6 years ago

Hi @larroy

Thank you for the update. I tried compiling it with different versions of gcc/stl that is 4.9.3 and 5.1.0 and the errors remain the same. Copied below are the compilation results,

https://pastebin.com/KTdVA6Sq
https://pastebin.com/9AXhVzi6

I am not pretty sure, how to proceed further. Maybe MXNET is not supported on our HPC Platform.

Thank you again :)

larroy commented 6 years ago

Mxnet should be portable. Could be the version of openblas that you are using, how did you install it?

Could you paste the context of from /cm/shared/apps/openblas/dynamic/0.2.6/include/openblas_config.h:85,

I think the stl is being included in extern "C" from the first error message that you pasted, could you quickly check if that's the case?

Without access to that system is difficult to diagnose. Can you try with your own checkout version of openblas? You can see how we compile with our own openblas on the docker_multiarch files.

harshitajain1994 commented 6 years ago

Proposed Labels : "Linux", "C++", "MKL", "Breaking", "Installation"

pengzhao-intel commented 6 years ago

@jerrin92 does the issue resolved?

pengzhao-intel commented 6 years ago

@marcoabreu @szha this issue is not related to MKL, please help remove the MKL label.

I believe, we are not using MKL