Open leezu opened 4 years ago
To solve this, I think we can instruct the compiler to always use 64 bit relocations instead of 32 bit relocations (that may overflow), -use [1] or use some linker relaxation techniques.-O2
(or in the extreme case -Os
) instead of -O3
to reduce code bloat
[1]: Still happens with -O2
My personal experience is that using 64bit relocation is fine on x86-64, so I am in favor of such change :-)
Linking master
works fine when using ninja
instead of make
. Not sure about the reason..
Looking at the cmake -GNinja -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_TVM_OP=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual DUSE_INT64_TENSOR_SIZE=ON ..
build with #17031, I make the following observations:
"By default" it fails like
libmxnet.a(utils.cc.o): In function `mxnet::common::ExecuteMonInputCallback(nnvm::IndexedGraph const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, unsigned long, std::function<void (char const*, char const*, void*)> const&)':
utils.cc:(.text+0xa5d): relocation truncated to fit: R_X86_64_PC32 against `.bss'
utils.cc:(.text+0xa6c): relocation truncated to fit: R_X86_64_PC32 against `.bss'
utils.cc:(.text+0xb48): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
utils.cc:(.text+0xd86): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
utils.cc:(.text+0xeab): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
utils.cc:(.text+0x1665): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x169d): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x16e0): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x1724): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x1742): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
Enabling -mcmodel=large
to use 64bit relocation, the failure is moved to a later stage:
libmxnet.a(utils.cc.o):(.eh_frame+0x6c): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0xb8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPfPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.1'
libmxnet.a(utils.cc.o):(.eh_frame+0xe8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPfPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.2'
libmxnet.a(utils.cc.o):(.eh_frame+0x118): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0x164): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPdPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.4'
libmxnet.a(utils.cc.o):(.eh_frame+0x194): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPdPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.5'
libmxnet.a(utils.cc.o):(.eh_frame+0x1e4): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0x21c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPNS5_4half6half_tEPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.7'
libmxnet.a(utils.cc.o):(.eh_frame+0x24c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPNS5_4half6half_tEPlSC_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.8'
libmxnet.a(utils.cc.o):(.eh_frame+0x27c): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
And when setting -Wl,--no-relax
, we get back to the state reported by CI at http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0 (which builds with clang, unlike my build here with gcc).
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_csu_fini' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
(.text+0x19): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_csu_init' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
(.text+0x20): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `main' defined in .text.startup section in tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o
(.text+0x26): relocation truncated to fit: R_X86_64_GOTPCRELX against symbol `__libc_start_main@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text._ZNKSt5ctypeIcE8do_widenEc'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x48): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN7testing8internal15TestFactoryImplI38ContextHashTest_ContextHashUnique_TestED2Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x5c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN7testing8internal15TestFactoryImplI38ContextHashTest_ContextHashUnique_TestED0Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0xc0): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN38ContextHashTest_ContextHashUnique_TestD2Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0xdc): additional relocation overflows omitted from the output
tests/mxnet_unit_tests: PC-relative offset overflow in PLT entry for `cudnnBatchNormalizationForwardInference@@libcudnn.so.7'
Any updates? I run into similiar issues recently.
I ran into similar issue with the latest master.
I run into the same issue with the latest master.
Same issue
@ptrendx is working on a fix (cf https://github.com/apache/incubator-mxnet/issues/18280#issuecomment-627010252)
Same issue on lastest master branch.
met the same issue any fix or workaround for it? I tried master branch, v1.4.x, v1.5.x, got the same result Environment: Ubuntu 18.04 GCC 7.6 CUDA 10.2 CUDNN 7.6.5
Set -DMXNET_CUDA_ARCH=7.0 or whatever arch you're targeting as workaround.
thanks leezu build success by setting the CUDA_ARCH
We get the same issue on PyTorch on CUDA 11 recently https://github.com/pytorch/pytorch/issues/39968
Happened again for the cu101 build: https://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1525/execution/node/177/log/
@eric-haibin-lin that pipeline isn't the one that produces the nightly builds. ~Currently the nightly builds for cu101 has stopped because MXNet follows the NVIDIA's supporting strategy on CUDA, which is only the latest two major and minor versions.~ The nightly build was failing due to a recent change, which has been reverted.
Problem still exist when building on Jetson NX
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/matrix_op.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/ordering_op.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/ravel.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/sparse_retain.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/square_sum.cu.o
[ 97%] Linking CXX shared library libmxnet.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_handler<void (unsigned int, std::ostream&), nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::ostream&)':
print_graph_ir.cc:(.text+0x1c4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x204): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_handler<void (unsigned int, std::ostream&), nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::ostream&)':
print_graph_ir.cc:(.text+0x2b4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}> const&, std::_Manager_operation)':
print_graph_ir.cc:(.text+0x318): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x324): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*, unsigned long)@@CXXABI_1.3.9' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x34c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator new(unsigned long)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x414): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator new(unsigned long)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x46c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x478): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*, unsigned long)@@CXXABI_1.3.9' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x480): relocation truncated to fit: R_AARCH64_CALL26 against symbol `_Unwind_Resume@@GCC_3.0' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libgcc_s.so
print_graph_ir.cc:(.text+0x4a0): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
CMakeFiles/mxnet.dir/build.make:9471: recipe for target 'libmxnet.so' failed
make[2]: *** [libmxnet.so] Error 1
CMakeFiles/Makefile2:740: recipe for target 'CMakeFiles/mxnet.dir/all' failed
make[1]: *** [CMakeFiles/mxnet.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2
Here's my cmake config
set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(CUDACXX "/usr/local/cuda-10.2/bin/nvcc" CACHE STRING "Cuda compiler")
set(MXNET_CUDA_ARCH "7.2" CACHE STRING "Cuda architectures")
@wms2537 did you include https://github.com/apache/incubator-mxnet/pull/19123 ?
Isn't it turned on by default, I used the code pulled from master, the problem still exists. I can compile it on normal pc but not on jetson.
Please paste the full cmake configure log. Also note that your Jetson uses AARCH64 and not X86 arch. The code memory model is different to X86 and compiler support generally much worse than on X86 (for example, if position independent code is required, gcc / clang may not implement anything but the default model, thus limiting the size of binary and causing relocation issue above).
We do test compiling MXNet on the Jetson AARCH64 architecture (https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.build.jetson), so in principle things should work and we just need to figure out how your environment differs from the tested one.
Here's the cmake output:
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.17.3' using generator 'Unix Makefiles'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_INTEL_LP64_LIBRARY MKL_INTEL_THREAD_LIBRARY MKL_CORE_LIBRARY IOMP_LIBRARY)
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- Found OpenCV: /usr (found version "4.1.1") found components: core highgui imgproc imgcodecs
-- OpenCV 4.1.1 found (/usr/lib/aarch64-linux-gnu/cmake/opencv4)
-- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
VERSION keyword not followed by a value or was followed by a value that
expanded to nothing.
-- Found PythonInterp: /usr/bin/python (found version "2.7.17")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found GTest: gtest
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /home/chkl/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Failed
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_72,code=sm_72
-- Found CUDAToolkit: /usr/local/cuda/include (found version "10.2.89")
-- Found NVML: /usr/local/cuda/include
-- Found NVML (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/stubs/libnvidia-ml.so)
-- Found Python3: /usr/bin/python3.6 (found version "3.6.9") found components: Interpreter
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chkl/mxnet/build
Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version)
Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the cmake -DCMAKE_BUILD_TYPE=Release
option when configuring the build.
Still the same:
io.cc:(.text+0xa8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@@GLIBCXX_3.4.21' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0xc0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `memcpy@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0xd8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__stack_chk_fail@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0xe4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_logic_error(char const*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
3rdparty/dmlc-core/libdmlc.a(io.cc.o): In function `dmlc::io::FileSystem::GetInstance(dmlc::io::URI const&)':
io.cc:(.text+0x118): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const' defined in .text._ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc[_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/core/symbolic.cc.o
io.cc:(.text+0x168): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_guard_acquire@@CXXABI_1.3' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0x18c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_guard_release@@CXXABI_1.3' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0x1a4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_atexit@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0x1c0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const' defined in .text._ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc[_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/core/symbolic.cc.o
io.cc:(.text+0x1dc): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dmlc::LogMessageFatal::LogMessageFatal(char const*, int)' defined in .text._ZN4dmlc15LogMessageFatalC2EPKci[_ZN4dmlc15LogMessageFatalC5EPKci] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/c_api/c_api_graph.cc.o
io.cc:(.text+0x1f0): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Here's my cmake log:
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.17.3' using generator 'Ninja'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- OpenCV Disabled
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
VERSION keyword not followed by a value or was followed by a value that
expanded to nothing.
-- Found PythonInterp: /usr/bin/python (found version "2.7.17")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found GTest: gtest
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /home/chkl/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Failed
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_52,code=sm_52
-- Found CUDAToolkit: /usr/local/cuda/include (found version "10.2.89")
-- Found NVML: /usr/local/cuda/include
-- Found NVML (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/stubs/libnvidia-ml.so)
-- Found Python3: /usr/bin/python3.6 (found version "3.6.9") found components: Interpreter
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
CMake Warning at CMakeLists.txt:839 (message):
OpenCV_VERSION_MAJOR: , version 3 with imgcodecs is required for im2rec,
im2rec will not be available
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chkl/mxnet/build
Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix)
You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to
I think my system toolchain is up to date, I am using jetpack 4.3. If not, how to update system toolchain?
The binutils is not part of jetpack. It is part of the operating system. You can check what package version is provided by the operating system used by your device.
With repsect to jetpack, we recommend you update to 4.4, as this is the version tested by our CI. If you still face problems, I really recommend you follow the cross-compilation approach as it is much faster and is tested by our CI server.
cc @TristonC @mseth10 do you have any recommendations for @wms2537's issues on Jetson NX device?
After some testing, I finally managed to build it. I updated ccache and openblas similar to
Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix)
You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to
Then, I restarted the jetson and built it with these commands
Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version)
Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the
cmake -DCMAKE_BUILD_TYPE=Release
option when configuring the build.
I also added a 8GB swap so that I can build with all 6 cores.
Based on the changes above, I don't know which is the main cause that solved the issue. Thanks @leezu for your help.
@wms2537 Thanks for sharing your tip.
You listed several changes. Are they applied at cmake
or make
step?
Would you mind sharing your "CMakeLists.txt" (if you modified) or modified command at the make
step?
I tried to build in Nvidia Jetson (AGX Orin) and am also having the same error of
additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/mxnet.dir/build.make:11134: libmxnet.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:645: CMakeFiles/mxnet.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
``` at 98% at 'make' step.
Runtime environment: Ubuntu 20.04, JetPack 5.0 (R 34), CUDA 11.4
Description
libmxnet.so
gets too large (depending on compile options), so that linking fails. This was observed before on CI with test coverage functionality enabled (https://github.com/apache/incubator-mxnet/issues/15971), but can also happen with non-test-coverage builds, such as-DUSE_INT64_TENSOR_SIZE=ON
build.I first observe this in the #17031 (http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0), but can easily reproduce it on the master branch when building with GCC 7.4.
Error Message
From the CI
Compiling master version with GCC on Ubuntu 18.04 (Deep Learning AMI) gives an equivalent error message (though slightly different wording due to GCC vs Clang).
To Reproduce
cmake -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52,70 -DUSE_INT64_TENSOR_SIZE=ON ..
on Ubuntu 18.04 (gcc 7.4, ld 2.3), where the CMake options here are taken from the
build_ubuntu_gpu_large_tensor
CI run.Environment
Environment used for reproducing the error with master version of MXNet.