ginkgo-project / ginkgo

Numerical linear algebra software package
https://ginkgo-project.github.io/
BSD 3-Clause "New" or "Revised" License
413 stars 88 forks source link

Building for many CUDA archs leads to linker errors #1734

Open lahwaacz opened 21 hours ago

lahwaacz commented 21 hours ago

While building a package for Arch Linux, I found that enabling all CUDA architectures (-DGINKGO_CUDA_ARCHITECTURES="All") leads to this error on the final link:

FAILED: lib/libginkgo_cuda.so.1.9.0
: && /usr/bin/c++ -fPIC -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection         -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g -ffile-prefix-map=/build/ginkgo-hpc-git/src=/usr/src/debug/ginkgo-hpc-git -flto=auto  -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs -flto=auto   -Wl,--dependency-file,cuda/CMakeFiles/ginkgo_cuda.dir/link.d -shared -Wl,-soname,libginkgo_cuda.so.1.9.0 -o lib/libginkgo_cuda.so.1.9.0 devices/cuda/CMakeFiles/ginkgo_cuda_device.dir/executor.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/device.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/exception.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/executor.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/memory.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/nvtx.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/scoped_device_id.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/stream.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/timer.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/base/version.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.3.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.5.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.6.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.7.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.9.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.10.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.11.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.12.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.14.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.15.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.17.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.18.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.19.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/csr_kernels.instantiate.20.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fbcsr_kernels.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fbcsr_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fbcsr_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/matrix/fft_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/preconditioner/batch_jacobi_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.3.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.5.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.6.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.7.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.9.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.10.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.0.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_bicgstab_launch.instantiate.1.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.0.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.3.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.5.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.6.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.0.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/batch_cg_launch.instantiate.1.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/lower_trs_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/solver/upper_trs_kernels.cu.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/base/device_matrix_data_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/base/index_set_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/absolute_array_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/fill_array_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/format_conversion_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/precision_conversion_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/components/reduce_array_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/distributed/partition_helpers_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/distributed/partition_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/coo_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/ell_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/hybrid_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/permutation_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/scaled_permutation_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/sellp_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/sparsity_csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/diagonal_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/multigrid/pgm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/preconditioner/jacobi_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/bicg_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/bicgstab_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/cg_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/cgs_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/common_gmres_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/fcg_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/gcr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/gmres_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/solver/ir_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/unified/matrix/dense_kernels.instantiate.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/base/batch_multi_vector_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/base/device_matrix_data_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/base/index_set_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/components/prefix_sum_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/index_map_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/matrix_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/partition_helpers_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/partition_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/distributed/vector_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/cholesky_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/factorization_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/ic_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/ilu_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/lu_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ic_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ict_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilu_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_approx_filter_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_filter_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_select_common.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_select_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_spgeam_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/factorization/par_ilut_sweep_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/batch_csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/batch_dense_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/batch_ell_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/coo_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/dense_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/diagonal_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/ell_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/sellp_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/matrix/sparsity_csr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/multigrid/pgm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/isai_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/sor_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/reorder/rcm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/solver/cb_gmres_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/solver/idr_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/solver/multigrid_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/stop/criterion_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/stop/residual_norm_kernels.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.1.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.2.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.4.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.8.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.13.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.16.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_generate_kernels.instantiate.32.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_simple_apply_kernels.instantiate.32.cpp.o cuda/CMakeFiles/ginkgo_cuda.dir/__/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernels.instantiate.32.cpp.o -L/opt/cuda/targets/x86_64-linux/lib/stubs   -L/opt/cuda/targets/x86_64-linux/lib   -L/usr/lib/gcc/x86_64-pc-linux-gnu/13.3.1 -Wl,-rpath,/opt/cuda/targets/x86_64-linux/lib:/build/ginkgo-hpc-git/src/build-cuda/lib:  /opt/cuda/targets/x86_64-linux/lib/libcudart.so  /opt/cuda/targets/x86_64-linux/lib/libcublas.so  /opt/cuda/targets/x86_64-linux/lib/libcusparse.so  /opt/cuda/targets/x86_64-linux/lib/libcurand.so  /opt/cuda/targets/x86_64-linux/lib/libcufft.so  lib/libginkgo_device.so.1.9.0  -ldl  -ldl  /usr/lib/librt.a  /opt/cuda/targets/x86_64-linux/lib/libcublasLt.so  /opt/cuda/targets/x86_64-linux/lib/libculibos.a  /opt/cuda/targets/x86_64-linux/lib/libnvJitLink.so  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl && :
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `std::_Function_handler<void (cublasContext*), gko::CudaExecutor::init_handles()::{lambda(cublasContext*)#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)':
/usr/include/c++/14.2.1/bits/std_function.h:274:(.text+0xfb): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro'
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `std::_Function_handler<void (cusparseContext*), gko::CudaExecutor::init_handles()::{lambda(cusparseContext*)#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)':
/usr/include/c++/14.2.1/bits/std_function.h:274:(.text+0x13b): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro'
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `nvtxEtiGetModuleFunctionTable_v3':
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:401:(.text+0x243): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:404:(.text+0x273): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:424:(.text+0x283): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:412:(.text+0x293): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:416:(.text+0x2a3): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:420:(.text+0x2b3): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `nvtxGetExportTable_v3':
/opt/cuda/targets/x86_64-linux/include/nvtx3/nvtxDetail/nvtxImpl.h:443:(.text+0x2d7): relocation truncated to fit: R_X86_64_PC32 against symbol `nvtxGlobals_v3' defined in .data.rel.local section in /tmp/cc6UKHTo.ltrans0.ltrans.o
/tmp/cc6UKHTo.ltrans0.ltrans.o: in function `gko::CudaExecutor::get_master()':
/usr/include/c++/14.2.1/ext/atomicity.h:52:(.text+0x332): additional relocation overflows omitted from the output
lib/libginkgo_cuda.so.1.9.0: PC-relative offset overflow in PLT entry for `_ZN3gko7kernels4cuda10run_kernelI17__nv_dl_wrapper_tI11__nv_dl_tagIPFvSt10shared_ptrIKNS_12CudaExecutorEEPKlPKNS_6matrix5DenseIfEEPSD_EXadL_ZNS1_5dense12symm_permuteIflEEvS8_PKT0_PKNSC_IT_EEPSP_EELj1EEJEEJRSF_RSA_RSG_EEEvS8_SO_NS_3dimILm2EmEEDpOT0_'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

For 1.8.0 we worked around it by omitting a few architectures:

# In general, we want to list all real archs (sm_XX) and the latest virtual arch (compute_XX) for future PTX compatibility.
# Valid values can be discovered from nvcc --help
# Compiling Ginkgo for all real architectures triggers linker limits (2 GB binary size). So let's omit 52, 53, 62, 72 from the list.
local _cuda_archs="50;60;61;70;75;80;86;87;89;90;90a;90a-virtual"

cmake -DCMAKE_CUDA_ARCHITECTURES="$_cuda_archs" ...

But building the develop branch now fails again with the same trick... Any ideas? Maybe split libginkgo_cuda.so to several smaller libs?

yhmtsai commented 13 hours ago

if reducing more architectures, will it still happen?

upsj commented 13 hours ago

If you remove all references to the NVTX library (including the header #include) from cuda/base/nvtx.cpp by emptying all functions, does the issue still appear?

lahwaacz commented 11 hours ago

if reducing more architectures, will it still happen?

Building for just one architecture works, but that does not help. The intention is to build a general binary package that can be used efficiently on any GPU architecture. Also, I've found a reduced set of archs that works for Ginkgo 1.8.0 but will lead to the same error on the next release (currently develop branch), and it is not practical to reduce architectures again and again for new releases.

If you remove all references to the NVTX library (including the header #include) from cuda/base/nvtx.cpp by emptying all functions, does the issue still appear?

It is not a problem with one specific library. Just tried to build it on a different system (without any code changes) and a different name appears in the output:

/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../lib/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libginkgo_cuda.so.1.9.0
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x16): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libginkgo_cuda.so.1.9.0
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x57): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x76): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x81): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /usr/lib/libc.so.6
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x8e): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o
/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/crtbeginS.o:(.text+0x94): additional relocation overflows omitted from the output
lib/libginkgo_cuda.so.1.9.0: PC-relative offset overflow in PLT entry for `_ZN3gko7kernels4cuda10run_kernelI17__nv_dl_wrapper_tI11__nv_dl_tagIPFvSt10shared_ptrIKNS_12CudaExecutorEEPKlPKNS_6matrix5DenseIfEEPSD_EXadL_ZNS1_5dense12symm_permuteIflEEvS8_PKT0_PKNSC_IT_EEPSP_EELj1EEJEEJRSF_RSA_RSG_EEEvS8_SO_NS_3dimILm2EmEEDpOT0_'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.