EarthWorksOrg / EarthWorks

Other
3 stars 2 forks source link

"Multiple definition" Error for GPU builds of compsets using multiple MPAS cores #36

Closed gdicker1 closed 5 months ago

gdicker1 commented 6 months ago

Whenever attempting to build compsets that involve multiple MPAS cores (e.g. the "FullyCoupled" compset that uses MPAS-A, MPAS-O, and MPAS-SI) with GPU flags enabled, symbol collisions occur and compilation fails. This points to duplicate compilation of routines in the "shared MPAS infrastructure" that is contained in the MPAS-A source code (which is compiled just by MPAS-A) as well as in EarthWorksOrg/mpas-framework repository (which is compiled by both MPAS-O and MPAS-SI). It seems that when OpenACC compilation is enabled, a different linking process occurs (with stricter rules for the linker for CPU builds) that causes compilation to fail when creating the cesm.exe file.

Example error message(s):

nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8350_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_3d_real_acc_8308_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7433_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_2d_real_acc_7393_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6519_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink error   : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6481_gpu' in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libocn.a:mpas_dmpar.o', first defined in '/glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/lib/libice.a:mpas_dmpar.o'
nvlink fatal   : merge_elf failed
pgacclnk: child process exit status 2: /glade/u/apps/common/23.04/spack/opt/spack/nvhpc/23.5/Linux_x86_64/23.5/compilers/bin/tools/nvdd
gmake: *** [/glade/work/gdicker/EarthWorks/EWRepo_PullRequests/2024Feb20_MPASfrwk6/cases/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/Tools/Makefile:978: /glade/derecho/scratch/gdicker/2024Feb20_113609_EWMTesting_FullyCoupled.mpasa120.derecho.nvhpc/bld/cesm.exe] Error 2

This issue continues the discussion from #31, but focuses on the GPU/OpenACC issues that were first mentioned in this comment.

Potential solutions:

  1. Use preprocessor directives to ensure that MPAS-A, MPAS-O, and MPAS-SI use routines with unique names.
  2. Edit GNU Make rules so that the "shared MPAS infrastructure" is compiled first and used by all MPAS cores being compiled. (Non-exclusive with solution 1)
gdicker1 commented 6 months ago

@dazlich and @areanddee I wanted to move the conversation about GPU builds to a separate thread so that #31 can be closed by the v2.1.001 release. Please feel free to link or copy any text that was important from the previous issue.

gdicker1 commented 5 months ago

Related PRs: https://github.com/EarthWorksOrg/mpas-seaice/pull/9, https://github.com/EarthWorksOrg/mpas-framework/pull/7, https://github.com/EarthWorksOrg/mpas-ocean/pull/7 (to be merged/used together)

gdicker1 commented 5 months ago

Marking this as closed since #39 should have fixed it. The multiple compilation issue is being tracked in #40.