Open lahwaacz opened 21 hours ago
if reducing more architectures, will it still happen?
If you remove all references to the NVTX library (including the header #include) from cuda/base/nvtx.cpp
by emptying all functions, does the issue still appear?
if reducing more architectures, will it still happen?
Building for just one architecture works, but that does not help. The intention is to build a general binary package that can be used efficiently on any GPU architecture. Also, I've found a reduced set of archs that works for Ginkgo 1.8.0 but will lead to the same error on the next release (currently develop branch), and it is not practical to reduce architectures again and again for new releases.
If you remove all references to the NVTX library (including the header #include) from cuda/base/nvtx.cpp by emptying all functions, does the issue still appear?
It is not a problem with one specific library. Just tried to build it on a different system (without any code changes) and a different name appears in the output:
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libginkgo_cuda.so.1.9.0
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x16): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libginkgo_cuda.so.1.9.0
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x57): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x76): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x81): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /usr/lib/libc.so.6
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x8e): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x94): additional relocation overflows omitted from the output
lib/libginkgo_cuda.so.1.9.0: PC-relative offset overflow in PLT entry for `_ZN3gko7kernels4cuda10run_kernelI17__nv_dl_wrapper_tI11__nv_dl_tagIPFvSt10shared_ptrIKNS_12CudaExecutorEEPKlPKNS_6matrix5DenseIfEEPSD_EXadL_ZNS1_5dense12symm_permuteIflEEvS8_PKT0_PKNSC_IT_EEPSP_EELj1EEJEEJRSF_RSA_RSG_EEEvS8_SO_NS_3dimILm2EmEEDpOT0_'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
While building a package for Arch Linux, I found that enabling all CUDA architectures (
-DGINKGO_CUDA_ARCHITECTURES="All"
) leads to this error on the final link:For 1.8.0 we worked around it by omitting a few architectures:
But building the develop branch now fails again with the same trick... Any ideas? Maybe split
libginkgo_cuda.so
to several smaller libs?