Relocation truncation issues

leezu commented 4 years ago

Description

libmxnet.so gets too large (depending on compile options), so that linking fails. This was observed before on CI with test coverage functionality enabled (https://github.com/apache/incubator-mxnet/issues/15971), but can also happen with non-test-coverage builds, such as -DUSE_INT64_TENSOR_SIZE=ON build.

I first observe this in the #17031 (http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0), but can easily reproduce it on the master branch when building with GCC 7.4.

Error Message

From the CI

/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o
crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output
libmxnet.so: PC-relative offset overflow in PLT entry for `_ZN5mxnet2op8mxnet_op6KernelINS0_9pick_gradILi3ELb0EEEN7mshadow3gpuEE6LaunchIJPdS9_PfiiNS5_5ShapeILi3EEESC_EEEvPNS5_6StreamIS6_EEiDpT_'
collect2: error: ld returned 1 exit status
FAILED: : && /tmp/ccache-redirects/g++  -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=c++11 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fopenmp -std=c++0x -O3 -DNDEBUG   tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/thread_local_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/threaded_engine_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/kvstore/gpu_topology_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/libinfo_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/activation_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/batchnorm_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/coreop_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/dropout_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/fully_conn_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/krprod_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_operator_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/runner/core_op_runner_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/slice_channel_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/tune/operator_tune_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/storage/storage_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cmake_device_link.o  -o tests/mxnet_unit_tests -L/usr/local/cuda/lib64  -L/work/build/3rdparty/tvm  -L/usr/local/cuda/targets/x86_64-linux/lib -Wl,-rpath,/usr/local/cuda/lib64:/work/build/3rdparty/openmp/runtime/src:/work/build/3rdparty/tvm lib/libgtest.a -Wl,--whole-archive libmxnet.a -Wl,--no-whole-archive 3rdparty/dmlc-core/libdmlc.a /usr/local/cuda/lib64/libnvToolsExt.so /usr/lib/libopenblas.so /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9 /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.2.4.9 3rdparty/openmp/runtime/src/libomp.so -lpthread -llapack /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/x86_64-linux-gnu/libcudnn.so -lcublas -lcufft -lcusolver -lcurand -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libprotobuf.so /usr/lib/x86_64-linux-gnu/libzmq.so 3rdparty/ps-lite/libpslite.a -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libzmq.so 3rdparty/ps-lite/libpslite.a -lprotobuf -lrt -lpthread -llapack /usr/lib/x86_64-linux-gnu/libcudnn.so -lcublas -lcufft -lcusolver -lcurand -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libprotobuf.so /usr/lib/x86_64-linux-gnu/libzmq.so -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libzmq.so -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.9 -ldl -lpthread -lcudadevrt -lcudart_static -lrt -lpthread -ldl && :
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `register_tm_clones':
crtstuff.c:(.text+0x49): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x82): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x95): relocation truncated to fit: R_X86_64_PC32 against `.bss'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o: In function `EngineShutdown_stop_without_crashing_Test::TestBody()':
engine_shutdown_test.cc:(.text+0xf8): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x130): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x137): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x15d): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
engine_shutdown_test.cc:(.text+0x18d): additional relocation overflows omitted from the output
tests/mxnet_unit_tests: PC-relative offset overflow in PLT entry for `nvrtcGetPTX@@libnvrtc.so.10.1'
collect2: error: ld returned 1 exit status

Compiling master version with GCC on Ubuntu 18.04 (Deep Learning AMI) gives an equivalent error message (though slightly different wording due to GCC vs Clang).

To Reproduce

cmake -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52,70 -DUSE_INT64_TENSOR_SIZE=ON ..

on Ubuntu 18.04 (gcc 7.4, ld 2.3), where the CMake options here are taken from the build_ubuntu_gpu_large_tensor CI run.

Environment

Environment used for reproducing the error with master version of MXNet.

----------Python Info----------
Version      : 3.8.0
Compiler     : GCC 7.4.0
Build        : ('default', 'Dec  8 2019 08:07:09')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.2.3
Directory    : /home/ubuntu/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ubuntu/src/mxnet-dc/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1056-aws-x86_64-with-glibc2.27
system       : Linux
node         : ip-172-31-26-35
release      : 4.15.0-1056-aws
version      : #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping:            7
CPU MHz:             3600.024
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0021 sec, LOAD: 0.3891 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0003 sec, LOAD: 0.3134 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0450 sec, LOAD: 0.0738 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0034 sec, LOAD: 0.0103 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0159 sec, LOAD: 0.1406 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0432 sec, LOAD: 0.3530 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0021 sec, LOAD: 0.0701 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0313 sec, LOAD: 0.1727 sec.

leezu commented 4 years ago

To solve this, I think we can instruct the compiler to always use 64 bit relocations instead of 32 bit relocations (that may overflow), -~~use -O2 (or in the extreme case -Os) instead of -O3 to reduce code bloat~~ [1] or use some linker relaxation techniques.

[1]: Still happens with -O2

junrushao commented 4 years ago

My personal experience is that using 64bit relocation is fine on x86-64, so I am in favor of such change :-)

leezu commented 4 years ago

Linking master works fine when using ninja instead of make. Not sure about the reason..

leezu commented 4 years ago

Looking at the cmake -GNinja -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_TVM_OP=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual DUSE_INT64_TENSOR_SIZE=ON .. build with #17031, I make the following observations:

"By default" it fails like

libmxnet.a(utils.cc.o): In function `mxnet::common::ExecuteMonInputCallback(nnvm::IndexedGraph const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, unsigned long, std::function<void (char const*, char const*, void*)> const&)':
utils.cc:(.text+0xa5d): relocation truncated to fit: R_X86_64_PC32 against `.bss'
utils.cc:(.text+0xa6c): relocation truncated to fit: R_X86_64_PC32 against `.bss'
utils.cc:(.text+0xb48): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
utils.cc:(.text+0xd86): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
utils.cc:(.text+0xeab): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
utils.cc:(.text+0x1665): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x169d): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x16e0): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x1724): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x1742): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

Enabling -mcmodel=large to use 64bit relocation, the failure is moved to a later stage:

libmxnet.a(utils.cc.o):(.eh_frame+0x6c): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0xb8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPfPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.1'
libmxnet.a(utils.cc.o):(.eh_frame+0xe8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPfPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.2'
libmxnet.a(utils.cc.o):(.eh_frame+0x118): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0x164): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPdPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.4'
libmxnet.a(utils.cc.o):(.eh_frame+0x194): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPdPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.5'
libmxnet.a(utils.cc.o):(.eh_frame+0x1e4): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0x21c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPNS5_4half6half_tEPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.7'
libmxnet.a(utils.cc.o):(.eh_frame+0x24c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPNS5_4half6half_tEPlSC_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.8'
libmxnet.a(utils.cc.o):(.eh_frame+0x27c): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

And when setting -Wl,--no-relax, we get back to the state reported by CI at http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0 (which builds with clang, unlike my build here with gcc).

/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_csu_fini' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
(.text+0x19): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_csu_init' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
(.text+0x20): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `main' defined in .text.startup section in tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o
(.text+0x26): relocation truncated to fit: R_X86_64_GOTPCRELX against symbol `__libc_start_main@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text._ZNKSt5ctypeIcE8do_widenEc'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x48): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN7testing8internal15TestFactoryImplI38ContextHashTest_ContextHashUnique_TestED2Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x5c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN7testing8internal15TestFactoryImplI38ContextHashTest_ContextHashUnique_TestED0Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0xc0): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN38ContextHashTest_ContextHashUnique_TestD2Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0xdc): additional relocation overflows omitted from the output
tests/mxnet_unit_tests: PC-relative offset overflow in PLT entry for `cudnnBatchNormalizationForwardInference@@libcudnn.so.7'

hubutui commented 4 years ago

Any updates? I run into similiar issues recently.

sxjscience commented 4 years ago

I ran into similar issue with the latest master.

schliffen commented 4 years ago

I run into the same issue with the latest master.

BogdanovKirill commented 4 years ago

Same issue

leezu commented 4 years ago

@ptrendx is working on a fix (cf https://github.com/apache/incubator-mxnet/issues/18280#issuecomment-627010252)

ghost commented 4 years ago

Same issue on lastest master branch.

armdebugger commented 4 years ago

met the same issue any fix or workaround for it? I tried master branch, v1.4.x, v1.5.x, got the same result Environment: Ubuntu 18.04 GCC 7.6 CUDA 10.2 CUDNN 7.6.5

leezu commented 4 years ago

Set -DMXNET_CUDA_ARCH=7.0 or whatever arch you're targeting as workaround.

armdebugger commented 4 years ago

thanks leezu build success by setting the CUDA_ARCH

zasdfgbnm commented 4 years ago

We get the same issue on PyTorch on CUDA 11 recently https://github.com/pytorch/pytorch/issues/39968

eric-haibin-lin commented 4 years ago

Happened again for the cu101 build: https://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1525/execution/node/177/log/

szha commented 4 years ago

@eric-haibin-lin that pipeline isn't the one that produces the nightly builds. ~Currently the nightly builds for cu101 has stopped because MXNet follows the NVIDIA's supporting strategy on CUDA, which is only the latest two major and minor versions.~ The nightly build was failing due to a recent change, which has been reverted.

wms2537 commented 4 years ago

Problem still exist when building on Jetson NX

[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/matrix_op.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/ordering_op.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/ravel.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/sparse_retain.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/square_sum.cu.o
[ 97%] Linking CXX shared library libmxnet.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_handler<void (unsigned int, std::ostream&), nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::ostream&)':
print_graph_ir.cc:(.text+0x1c4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x204): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_handler<void (unsigned int, std::ostream&), nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::ostream&)':
print_graph_ir.cc:(.text+0x2b4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}> const&, std::_Manager_operation)':
print_graph_ir.cc:(.text+0x318): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x324): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*, unsigned long)@@CXXABI_1.3.9' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x34c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator new(unsigned long)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x414): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator new(unsigned long)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x46c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x478): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*, unsigned long)@@CXXABI_1.3.9' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x480): relocation truncated to fit: R_AARCH64_CALL26 against symbol `_Unwind_Resume@@GCC_3.0' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libgcc_s.so
print_graph_ir.cc:(.text+0x4a0): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
CMakeFiles/mxnet.dir/build.make:9471: recipe for target 'libmxnet.so' failed
make[2]: *** [libmxnet.so] Error 1
CMakeFiles/Makefile2:740: recipe for target 'CMakeFiles/mxnet.dir/all' failed
make[1]: *** [CMakeFiles/mxnet.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

Here's my cmake config

set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(CUDACXX "/usr/local/cuda-10.2/bin/nvcc" CACHE STRING "Cuda compiler")
set(MXNET_CUDA_ARCH "7.2" CACHE STRING "Cuda architectures")

leezu commented 4 years ago

@wms2537 did you include https://github.com/apache/incubator-mxnet/pull/19123 ?

wms2537 commented 4 years ago

Isn't it turned on by default, I used the code pulled from master, the problem still exists. I can compile it on normal pc but not on jetson.

leezu commented 4 years ago

Please paste the full cmake configure log. Also note that your Jetson uses AARCH64 and not X86 arch. The code memory model is different to X86 and compiler support generally much worse than on X86 (for example, if position independent code is required, gcc / clang may not implement anything but the default model, thus limiting the size of binary and causing relocation issue above).

We do test compiling MXNet on the Jetson AARCH64 architecture (https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.build.jetson), so in principle things should work and we just need to figure out how your environment differs from the tested one.

wms2537 commented 4 years ago

Here's the cmake output:

-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.17.3' using generator 'Unix Makefiles'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_INTEL_LP64_LIBRARY MKL_INTEL_THREAD_LIBRARY MKL_CORE_LIBRARY IOMP_LIBRARY) 
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- Found OpenCV: /usr (found version "4.1.1") found components: core highgui imgproc imgcodecs 
-- OpenCV 4.1.1 found (/usr/lib/aarch64-linux-gnu/cmake/opencv4)
--  OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
  VERSION keyword not followed by a value or was followed by a value that
  expanded to nothing.

-- Found PythonInterp: /usr/bin/python (found version "2.7.17") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found GTest: gtest  
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so  
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /home/chkl/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Failed
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_72,code=sm_72
-- Found CUDAToolkit: /usr/local/cuda/include (found version "10.2.89") 
-- Found NVML: /usr/local/cuda/include  
-- Found NVML (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/stubs/libnvidia-ml.so)
-- Found Python3: /usr/bin/python3.6 (found version "3.6.9") found components: Interpreter 
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chkl/mxnet/build

leezu commented 4 years ago

Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version)

https://github.com/apache/incubator-mxnet/blob/db171a89c5e7e0d651ae1578bd8ae8da953417cc/ci/docker/runtime_functions.sh#L140-L155

Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the cmake -DCMAKE_BUILD_TYPE=Release option when configuring the build.

wms2537 commented 4 years ago

Still the same:

io.cc:(.text+0xa8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@@GLIBCXX_3.4.21' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0xc0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `memcpy@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0xd8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__stack_chk_fail@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0xe4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_logic_error(char const*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
3rdparty/dmlc-core/libdmlc.a(io.cc.o): In function `dmlc::io::FileSystem::GetInstance(dmlc::io::URI const&)':
io.cc:(.text+0x118): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const' defined in .text._ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc[_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/core/symbolic.cc.o
io.cc:(.text+0x168): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_guard_acquire@@CXXABI_1.3' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0x18c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_guard_release@@CXXABI_1.3' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0x1a4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_atexit@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0x1c0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const' defined in .text._ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc[_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/core/symbolic.cc.o
io.cc:(.text+0x1dc): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dmlc::LogMessageFatal::LogMessageFatal(char const*, int)' defined in .text._ZN4dmlc15LogMessageFatalC2EPKci[_ZN4dmlc15LogMessageFatalC5EPKci] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/c_api/c_api_graph.cc.o
io.cc:(.text+0x1f0): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Here's my cmake log:

-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.17.3' using generator 'Ninja'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- OpenCV Disabled
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
  VERSION keyword not followed by a value or was followed by a value that
  expanded to nothing.

-- Found PythonInterp: /usr/bin/python (found version "2.7.17") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found GTest: gtest  
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so  
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /home/chkl/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Failed
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_52,code=sm_52
-- Found CUDAToolkit: /usr/local/cuda/include (found version "10.2.89") 
-- Found NVML: /usr/local/cuda/include  
-- Found NVML (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/stubs/libnvidia-ml.so)
-- Found Python3: /usr/bin/python3.6 (found version "3.6.9") found components: Interpreter 
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
CMake Warning at CMakeLists.txt:839 (message):
  OpenCV_VERSION_MAJOR: , version 3 with imgcodecs is required for im2rec,
  im2rec will not be available

-- Configuring done
-- Generating done
-- Build files have been written to: /home/chkl/mxnet/build

leezu commented 4 years ago

Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix)

You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to

https://github.com/apache/incubator-mxnet/blob/95f5cc60904a2d88d4861fff0f6dbad15f8cdbe3/ci/docker/Dockerfile.build.jetson#L41-L91

wms2537 commented 4 years ago

I think my system toolchain is up to date, I am using jetpack 4.3. If not, how to update system toolchain?

leezu commented 4 years ago

The binutils is not part of jetpack. It is part of the operating system. You can check what package version is provided by the operating system used by your device.

With repsect to jetpack, we recommend you update to 4.4, as this is the version tested by our CI. If you still face problems, I really recommend you follow the cross-compilation approach as it is much faster and is tested by our CI server.

leezu commented 4 years ago

cc @TristonC @mseth10 do you have any recommendations for @wms2537's issues on Jetson NX device?

wms2537 commented 4 years ago

After some testing, I finally managed to build it. I updated ccache and openblas similar to

Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix)

You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to

https://github.com/apache/incubator-mxnet/blob/95f5cc60904a2d88d4861fff0f6dbad15f8cdbe3/ci/docker/Dockerfile.build.jetson#L41-L91

Then, I restarted the jetson and built it with these commands

Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version)

https://github.com/apache/incubator-mxnet/blob/db171a89c5e7e0d651ae1578bd8ae8da953417cc/ci/docker/runtime_functions.sh#L140-L155

Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the cmake -DCMAKE_BUILD_TYPE=Release option when configuring the build.

I also added a 8GB swap so that I can build with all 6 cores.

Based on the changes above, I don't know which is the main cause that solved the issue. Thanks @leezu for your help.

cloudlakecho commented 1 year ago

@wms2537 Thanks for sharing your tip. You listed several changes. Are they applied at cmake or make step?

Would you mind sharing your "CMakeLists.txt" (if you modified) or modified command at the make step?

I tried to build in Nvidia Jetson (AGX Orin) and am also having the same error of


additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/mxnet.dir/build.make:11134: libmxnet.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:645: CMakeFiles/mxnet.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
``` at 98% at 'make' step. 

Runtime environment: Ubuntu 20.04, JetPack 5.0 (R 34),  CUDA 11.4

apache / mxnet