Closed chinakook closed 6 years ago
Same errors occurred on archlinux with building tools chain as follows:
$ yaourt -Q gcc6 python nvidia-dkms cuda
community/gcc6 6.4.1-5
extra/python 3.6.4-2
extra/nvidia-dkms 390.48-3
community/cuda 9.1.85.2-1
My another PC with Ubuntu 16.04(gcc5/cuda8) can build successfully. I think there are some error in nnvm/tvm with gcc 6.4 or cuda 9.1 after April 11.
As I've tested, it's the problem with https://github.com/dmlc/mshadow/pull/330 in mshadow. @rahul003 @piiswrong
@chinakook So ubuntu with gcc6.4 fails? I see. Could you help confirm two things?
Same here with gcc 6.4 / cuda 9.1 on Arch linux. The make build with USE_F16C=0 removes the error.
Hi, @rahul003 @asitstands has confirm the first one. mxnet on cmake is not very friendly with gcc6.
For the CMake build I checked both that the CPU supports f16c and compiler supports f16c and then set the flag on. For Make I couldn't figure out how to identify if the compiler supports the flag, so I only checked that the CPU supports it. I'll try to see if I can figure out compiler support automatically and then set USE_F16C=0
how i can disable the f16c on cmake?
-- Performing Test COMPILER_SUPPORT_MF16C
-- Performing Test COMPILER_SUPPORT_MF16C - Success
my processor and compiler support it, but fail build as described in #10644 and here
EDIT: NVM, set -DCOMPILER_SUPPORT_MF16C=OFF
do the trick
Build with gcc-6.4 on Ubuntu16.04 works for me with cuda9.0. I'm not sure what changes with ubuntu17.04. Cuda9.1 should not affect this
What CPU do you have? Could you run the diagnose script mentioned in the issue and paste the output to the issue?
i7-6700k. I've tested on Ubuntu 17.10 and 18.04, both failed without USE_F16C=0.
I'll try later
@rahul003 However, the version of gcc6 in Ubuntu 16.04 is 6.0.1
Xeon (dual) E5-2650-V4(ES) here. Failed without USE_F16C=0
or -DCOMPILER_SUPPORT_MF16C=OFF
I did install gcc6.4 and g++6.4.
The issue is similar to https://github.com/apache/incubator-mxnet/issues/8576 and https://github.com/tensorflow/tensorflow/issues/10220 . The combination of Ubuntu version, and gcc version and NVCC seem to be problematic.
I'wll try later w/o CUDA/CUDNN and w/ USE_F16C=1
Please do and let me know, also when trying that turn off the multi threaded compile. Just do a simple make so we get to see the exact file which fails to compile.
Here is the minimal example to reproduce the same errors.
#include <x86intrin.h> // mshadow/half.h includes this
int main() {
}
Compiling this with nvcc -O3
causes the errors. If I remove -O3
, the compilation is successful. My linux disro is Arch. CUDA is 9.1.85. I tested this with gcc 6.4.1. CUDA 9 does not support gcc 7, but anyway gcc 7.3.1 also shows the exactly same behavior. Compiling directly with g++ does not cause any errors. So the problem may be from nvcc. I'm not sure there is any workaround.
I tried to report this as a bug to nvidia developer site, but the reporting system does not wok in my firefox and chromium. It would be helpful if someone else could report this.
seems this works: simply remove all -O3
from the CMakeList.txt (if use cmake)
(cd incubator-mxnet; for i in $(grep -l -R \\-O3 | grep CMake); do sed -e 's|-O3 ||g' -e 's| -O3||g' -i ${i}; done)
after update submodules
the "conflictive" files is:
└───╼ grep -R \\-O3 | grep CMake
CMakeLists.txt: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3 -g")
CMakeLists.txt: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
3rdparty/nnvm/tvm/dmlc-core/CMakeLists.txt: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
3rdparty/nnvm/tvm/CMakeLists.txt: set(CMAKE_C_FLAGS "-O3 -Wall -fPIC")
3rdparty/nnvm/dmlc-core/CMakeLists.txt: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
3rdparty/nnvm/CMakeLists.txt: set(CMAKE_C_FLAGS "-O3 -Wall -std=c++11 -fPIC")
3rdparty/dmlc-core/CMakeLists.txt: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
As per https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html, "CUDA development environment relies on tight integration with the host development environment, including the host compiler and C runtime libraries, and is therefore only supported on distribution versions that have been qualified for this CUDA Toolkit release". For ubuntu 17.04 the gcc version 6.3 seems supported.
what is the problem if use -O2
instead of -O3
?
with -O2
build all ok, with USE_F16C=1
\ DCOMPILER_SUPPORT_MF16C=ON
@rahul003 How about to enable x86 intrinsic conversions for half_t
only if __CUDACC__
is not defined? As long as nvcc does not compile code path converting half_t.half_
, there may be no harm and it could solve this issue.
@sl1pkn07 can you please provide lscpu output.
└───╼ lscpu
Arquitectura: x86_64
modo(s) de operación de las CPUs: 32-bit, 64-bit
Orden de los bytes: Little Endian
CPU(s): 48
Lista de la(s) CPU(s) en línea: 0-47
Hilo(s) de procesamiento por núcleo: 2
Núcleo(s) por «socket»: 12
«Socket(s)» 2
Modo(s) NUMA: 2
ID de fabricante: GenuineIntel
Familia de CPU: 6
Modelo: 79
Nombre del modelo: Genuine Intel(R) CPU 0000 @ 2.20GHz
Revisión: 0
CPU MHz: 2195.229
CPU MHz máx.: 2400,0000
CPU MHz mín.: 1200,0000
BogoMIPS: 4392.39
Virtualización: VT-x
Caché L1d: 32K
Caché L1i: 32K
Caché L2: 256K
Caché L3: 30720K
CPU(s) del nodo NUMA 0: 0-11,24-35
CPU(s) del nodo NUMA 1: 12-23,36-47
Indicadores: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
E5-2650-V4 ES (engineering sample)
@sl1pkn07 seems like your issue is different to the one on : #10705. For #10705, the issue is that the hardware doesnt support f16c and for your case it seems to be a compiler issue. What version of gcc are you using by default ?
by default 7.3.1, but set 6.4.1 for CUDA applications (-DCUDA_HOST_COMPILER) because cuda 9.1.x not support gcc v7 or more
edit:typo
@sl1pkn07 Would it be possible to try gcc6.3? Cuda9.1 says it is compatible with that version
@asitstands Thanks for the suggestion. I'm trying that change in the below PR. Would you or @sl1pkn07 be able to verify that this works for you guys with USE_F16C=1?
@rahul003 I was able to test this on g++-6.4 before and after the change. It seems to build fine after the change. I tried the following command: make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda
Thanks @anirudh2290
i'm not sure if i make something wrong, but with the last changes pushed to this repo (97da5e3b35b0725d10deffef0032f203df51d271) and keep the -O3
in all submodules makefiles/cmake, and patch incubator-mxnet/3rdparty/mshadow
with https://github.com/rahul003/mshadow/pull/1, i have the same issue
in my case, the problem is in
- Generating /tmp/makepkg/sl1-mxnet-git/src/build/CMakeFiles/cuda_compile_1.dir/src/operator/tensor/./cuda_compile_1_generated_square_sum.cu.o
/opt/cuda/bin/nvcc /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/operator/tensor/square_sum.cu -c -o /tmp/makepkg/sl1-mxnet-git/src/build/CMakeFiles/cuda_compile_1.dir/src/operator/tensor/./cuda_compile_1_generated_square_sum.cu.o -ccbin /opt/cuda/bin/gcc -m64 --std c++11 -DNDEBUG=1 -DDMLC_USE_CXX11=1 -DMSHADOW_USE_CUDA=1 -DMXNET_USE_NCCL=1 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE=1 -DMSHADOW_USE_F16C=1 -DMSHADOW_FORCE_STREAM -DUSE_CUDNN -DMXNET_USE_OPENCV=1 -DMXNET_USE_LAPACK=1 -DMSHADOW_USE_CUDNN=1 -DMXNET_USE_OPERATOR_TUNING=1 -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fPIC\",\"-Wno-sign-compare\",\"-O3\",\"-msse2\",\"-mf16c\",\"-fopenmp\" -gencode arch=compute_61,code=sm_61 -Xcudafe --diag_suppress=cc_clobber_ignored -Xcudafe --diag_suppress=integer_sign_change -Xcudafe --diag_suppress=useless_using_declaration -Xcudafe --diag_suppress=set_but_not_used -Xcompiler -fPIC -DNVCC -I/opt/cuda/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/mkldnn/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src -I/usr/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/mshadow -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/cub -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/tvm/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dmlc-core/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dlpack/include -I/usr/include/opencv
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9220): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9231): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9244): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9255): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9268): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9279): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9292): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9303): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9316): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9327): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9340): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9352): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9365): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9376): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9389): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9401): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9410): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9419): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9428): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9437): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9445): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9454): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9463): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9472): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9481): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9490): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9499): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9508): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9517): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9526): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9535): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512fintrin.h(9544): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(55): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(63): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(73): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(81): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(91): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(100): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(109): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(117): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(127): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(136): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(145): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512pfintrin.h(153): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10799): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10811): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10823): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10835): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10847): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10859): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10871): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10883): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10895): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10907): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10919): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10931): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10943): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10955): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10967): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10979): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(10989): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11000): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11009): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11020): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11029): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11040): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11049): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11060): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11069): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11080): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11089): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11100): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11109): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11120): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11129): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11140): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11149): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11160): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11169): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11180): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11189): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11200): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11209): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11220): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11229): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11240): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11249): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11260): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11269): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11280): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11289): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-pc-linux-gnu/6.4.1/include/avx512vlintrin.h(11300): error: argument of type "void *" is incompatible with parameter of type "long long *"
92 errors detected in the compilation of "/tmp/tmpxft_0000bc28_00000000-6_square_sum.cpp1.ii".
-- Removing /tmp/makepkg/sl1-mxnet-git/src/build/CMakeFiles/cuda_compile_1.dir/src/operator/tensor/./cuda_compile_1_generated_square_sum.cu.o
/usr/bin/cmake -E remove /tmp/makepkg/sl1-mxnet-git/src/build/CMakeFiles/cuda_compile_1.dir/src/operator/tensor/./cuda_compile_1_generated_square_sum.cu.o
CMake Error at cuda_compile_1_generated_square_sum.cu.o.None.cmake:281 (message):
Error generating file
/tmp/makepkg/sl1-mxnet-git/src/build/CMakeFiles/cuda_compile_1.dir/src/operator/tensor/./cuda_compile_1_generated_square_sum.cu.o
make[2]: *** [CMakeFiles/mxnet_static.dir/build.make:744: CMakeFiles/cuda_compile_1.dir/src/operator/tensor/cuda_compile_1_generated_square_sum.cu.o] Error 1
make[2]: se sale del directorio '/tmp/makepkg/sl1-mxnet-git/src/build'
make[1]: *** [CMakeFiles/Makefile2:106: CMakeFiles/mxnet_static.dir/all] Error 2
make[1]: se sale del directorio '/tmp/makepkg/sl1-mxnet-git/src/build'
make: *** [Makefile:130: all] Error 2
make: se sale del directorio '/tmp/makepkg/sl1-mxnet-git/src/build'
-ccbin /opt/cuda/bin/gcc
is GCC 6.4.1
seems arch pushed new revision of cuda, now with GCC 5.5.0 as cuda host compiler
unfortunately, have the same issue :(
greetings
EDIT: test again with include:
-DCMAKE_C_COMPILER=/usr/bin/gcc-5 \
-DCMAKE_C_COMPILER_AR=/usr/bin/gcc-ar-5 \
-DCMAKE_C_COMPILER_RANLIB=/usr/bin/gcc-ranlib-5 \
-DCMAKE_CXX_COMPILER=/usr/bin/g++-5 \
-DCMAKE_CXX_COMPILER_AR=/usr/bin/gcc-ar-5 \
-DCMAKE_CXX_COMPILER_RANLIB=/usr/bin/gcc-ranlib-5
same issue
hi @sl1pkn07 . can you please provide reproduce steps. I did the following without issues:
cd incubator-mxnet
git submodule init && git submodule sync && git submodule update --recursive --init
make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda
My ccbin is g++-6 6.4.0.
same like you, except i use cmake
install by pacman: (-- means pulled by pacman as dependencies): cuda (9.1.85.3) -- gcc5 (5.5.0) -- nvidia-utils (390.48) # superseed by my own package (396.18.05) cudnn (7.1.2) opencv (3.4.1) -- lapack (3.8.0) cmake (3.10.3) make (4.2.1) -- glibc (2.27) nccl [AUR] (2.1.15.1) libopenblas [AUR] (0.2.20) gtest (1.8.0)
#setted by makepkg (archlinux package builder)
CPPFLAGS="-D_FORTIFY_SOURCE=2"
CFLAGS="-march=native -O2 -pipe -fstack-protector-strong -fno-plt"
CXXFLAGS="-march=native -O2 -pipe -fstack-protector-strong -fno-plt"
LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now"
MAKEFLAGS="-j40" #used internally by makepkg make wrapper
# use system openmp (?)
rm -fr 3rdparty/openmp
# use system gtest
sed '/GTEST/s/^/#/g' -i incubator-mxnet/CMakeLists.txt
# Fix typo (?) in cpp-package/CMakeLists.txt. https://github.com/apache/incubator-mxnet/issues/10742
sed 's|NOT DO_NOT_BUILD_EXAMPLES|BUILD_CPP_EXAMPLES|' -i incubator-mxnet/cpp-package/CMakeLists.txt
# gcc5 don't like plt
export CFLAGS="${CFLAGS/-fno-plt/}"
export CXXFLAGS="${CXXFLAGS/-fno-plt/}"
# https://github.com/apache/incubator-mxnet/issues/10558#issuecomment-385551954
patch -d incubator-mxnet/3rdparty/mshadow -p1 -i "${srcdir}/1.diff"
# # Remove -O3. see https://github.com/apache/incubator-mxnet/issues/10558
# (cd incubator-mxnet; for i in $(grep -l -R \\-O3 | grep CMakeList.txt); do sed -e 's|-O3 ||g' -e 's| -O3||g' -i ${i}; done)
mkdir build
cd build
cmake ../incubator-mxnet \
-DCMAKE_BUILD_TYPE=None \
-DCMAKE_INSTALL_PREFIX=/usr \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCUDA_HOST_COMPILER=/opt/cuda/bin/gcc \ # points to gcc5 by symlink
-DCUDA_SDK_ROOT_DIR=/opt/cuda \
-DBUILD_SHARED_LIBS=ON \
-DBUILD_TESTING=OFF \
-DBUILD_CPP_EXAMPLES=OFF \
-DUSE_CPP_PACKAGE=ON \
-DUSE_CUDNN=ON \
-DUSE_NCCL=ON \
-DUSE_OPENCV=ON \
-DUSE_OPENMP=ON \
-DUSE_LAPACK=ON \
-DUSE_JEMALLOC=OFF \
-DUSE_GPERFTOOLS=OFF \
-DWITH_EXAMPLE=OFF \
-DWITH_TEST=OFF \
-DUSE_OLDCMAKECUDA=ON \
-DCMAKE_C_COMPILER=/usr/bin/gcc-5 \
-DCMAKE_C_COMPILER_AR=/usr/bin/gcc-ar-5 \
-DCMAKE_C_COMPILER_RANLIB=/usr/bin/gcc-ranlib-5 \
-DCMAKE_CXX_COMPILER=/usr/bin/g++-5 \
-DCMAKE_CXX_COMPILER_AR=/usr/bin/gcc-ar-5 \
-DCMAKE_CXX_COMPILER_RANLIB=/usr/bin/gcc-ranlib-5
make
this fail build. but if uncomment the lines:
# # Remove -O3. see https://github.com/apache/incubator-mxnet/issues/10558
# (cd incubator-mxnet; for i in $(grep -l -R \\-O3 | grep CMakeList.txt); do sed -e 's|-O3 ||g' -e 's| -O3||g' -i ${i}; done)
all build without problems
Hey could you remove this line and retry https://github.com/dmlc/mshadow/blob/5da1d9084e56bf1d7af246f632f4d59a995c76cd/cmake/mshadow.cmake#L73
that line is the line L70, seems need update the commit reference when use git submodule update --init --recursive
Ruta de submódulo '3rdparty/mshadow': check out realizado a '317fad64cc234c458e3f01ff47fffe3b8b3e5f63'
and nope, removing that line do nothing :(, same error
Can you do make VERBOSE=1 and share log of the first few files being compiled?
@sl1pkn07 I don't understand what you mean about the submodules.
Please try the following
Fresh clone. Don't change any of the submodules git clone --recursive incubator-mxnet
Remove line 73 from 3rdparty/mshadow/cmake/mshadow.cmake
(the line which says add_definitions(-DMSHADOW_USE_F16C=1)
)
then build
If that fails. Then please run make VERBOSE=1 and paste the first 50-100 lines of the log.
Thanks!
I can reproduce your error when that line exists with cmake and gcc6.4. Removing that line fixes that for me. I've created a PR to do that. There might be something simple going wrong. Can you clean and try from scratch
Fresh clone. Don't change any of the submodules git clone --recursive incubator-mxnet
done
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/cub'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dlpack'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dmlc-core'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/mkldnn'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/mshadow'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm'...
done.
Submodule path '3rdparty/cub': checked out '05eb57faa0a4cac37c2a86fdf4b4dc865a95a1a3'
Submodule path '3rdparty/dlpack': checked out '10892ac964f1af7c81aae145cd3fab78bbccd297'
Submodule path '3rdparty/dmlc-core': checked out 'e9446f5a53cf5e61273deff7ce814093d2791766'
Submodule path '3rdparty/mkldnn': checked out 'b4137dfc88e3bf5c6b62e833121802eb8c6696da'
Submodule path '3rdparty/mshadow': checked out '317fad64cc234c458e3f01ff47fffe3b8b3e5f63'
Submodule path '3rdparty/nnvm': checked out '2bc5144cd3733fd239287e3560c7db8285d21f02'
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/dmlc-core'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/tvm'...
done.
Submodule path 'dmlc-core': checked out '42823a731bdb2c22aa44775c0937466046400c02'
Submodule path 'tvm': checked out 'fdba6cc9bd3bec9ccd0592fa3900b7fe25d6cb97'
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/tvm/HalideIR'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/tvm/dlpack'...
done.
Cloning into '/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/nnvm/tvm/dmlc-core'...
done.
Submodule path 'HalideIR': checked out 'e20e5e9abb3aa43147a90a4ffb3e190f62862970'
Submodule path 'dlpack': checked out '10892ac964f1af7c81aae145cd3fab78bbccd297'
Submodule path 'dmlc-core': checked out 'd3f7fbb53e5b037c0f5bf6bd21871ccc720690cc'
Remove line 73 from 3rdparty/mshadow/cmake/mshadow.cmake (the line which says add_definitions(-DMSHADOW_USE_F16C=1))
the line 73 say add_definitions(-DMSHADOW_USE_F16C=0)
this is because this:
Submodule path '3rdparty/mshadow': checked out '317fad64cc234c458e3f01ff47fffe3b8b3e5f63'
i'm not sure why git submodule update --init --recursive
or git clone --recursive incubator-mxnet
not checkout the submodules to the latest commit in upstream instead of use the commit setted in the repo (sorry, i can't explain better, my english is not good)
clone repo
apply 'https://patch-diff.githubusercontent.com/raw/apache/incubator-mxnet/pull/10771.diff'
init submodules with: git submodules update --init --recursive
apply:
'https://patch-diff.githubusercontent.com/raw/dmlc/mshadow/pull/335.diff'
'https://patch-diff.githubusercontent.com/raw/dmlc/mshadow/pull/336.diff'
the output https://pastebin.com/7Kf7k2C1
There seems to be something wrong with the patch because when the error shows up -DMSHADOW_USE_F16C=1 is seen which should not be there. This flag can not be introduced if all the following changes are done correctly.
Either cd 3rdparty/mshadow && git pull origin master && cd ../../
or apply the PR https://github.com/apache/incubator-mxnet/pull/10760/
same error, but different output
the -DMSHADOW_USE_F16C=1
dissapear
(i'm go to bed, tomorrow more, see ya)
Hi @sl1pkn07, I realized this could have nothing to do with USE_F16C. The same failure exists on code from Nov 2017 with gcc5.5. Similar failure exists for tensorflow as well. This has to do with the compatibility of nvcc with host OS and compiler.
Ref: apache/incubator-mxnet#8576 tensorflow/tensorflow#10220
Nvidia doesn't have a guideline for archlinux. Could you please try changing compiler versions? On my ubuntu 16.04 machine, gcc 5.4 and gcc6.4 work but gcc5.5 doesn't work.
I can try build another gcc version. But need time to prepare the environment
yes, seems GCC 5.4.0 and GCC 6.3.1(snapshot used in archlinux) build OK without get rid -O3
in all makefiles/cmakelists scripts
in both cases, i use GCC 7.3.1 (failed in GCC 8.1.0) for the main compiler and GCC 5.4.0/6.3.1 as CUDA_HOST_COMPILER
(ccbin)
for avoid this, is possible add a check in makeflie/cmake if use "bad" version of gcc for CUDA ccbin (or main compiler if not set CMAKE_HOST_COMPILER
?
greetings
Good to know @sl1pkn07. Thanks for verifying that this works. That's a good idea, the only concern is that we might need to come up with an exhaustive of list of which compilers work and which don't on different platforms and I'm not sure how that would work.
seems is better select what is not valid instead of what yes. in my case, all more than 6.3.1 and 5.5.0
i need try with 6.4.0, because with 6.4.1 fails.
greetings
just for note
new Cuda 9.2.88.1 build OK with GCC 7.3.1
I tried all this solutions but none work for me. I have the:
error: "nnvm:NodePtr' has not been declared error.
Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io
Description
NNVM module is built failed in the newest mxnet vesion. However it can be successfully built in 55e74350 version(April 11).
Environment info (Required)
Ubuntu 17.10.1 / CUDA 9.1 / gcc 6.4
Package used (Python/R/Scala/Julia): (I'm using ...)
For Scala user, please provide:
java -version
)mvn -version
)scala -version
)For R user, please provide R
sessionInfo()
:Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash: e3b1daf543374094bff6b39ec004d034fcf18b43
Build config:
Error Message:
(Paste the complete error message, including stack trace.)
Minimum reproducible example
(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
1. 2.
What have you tried to solve it?
1. 2. @tqchen