NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

gcc build of cesm on gust #28

Closed jedwards4b closed 1 year ago

jedwards4b commented 1 year ago

I am getting a runtime error when compiling cesm with gcc on gust: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries

Using ldd I see symbols libsci_gnu_82.so.5 => /opt/cray/pe/lib64/libsci_gnu_82.so.5 (0x00007f3dd182d000) libsci_gnu_82_mp.so.5 => /opt/cray/pe/libsci/22.08.1.1/gnu/9.1/x86_64/lib/libsci_gnu_82_mp.so.5 (0x00007f3dc95c8000)

How do I determine what provided these symbols so that I can resolve the problem?

jedwards4b commented 1 year ago

Also seeing this issue with nvhpc.

jedwards4b commented 1 year ago

I am ldd'ing cesm: /glade/gust/scratch/jedwards/SMS.f19_g17.X.gust_gnu.20221024_085509_92swhs/bld/cesm.exe

Build log is in: /glade/gust/scratch/jedwards/SMS.f19_g17.X.gust_gnu.20221024_085509_92swhs/bld/cesm.bldlog.221024-085700.gz

Currently Loaded Modules: 1) ncarenv/22.10 (S) 3) gcc/12.1.0 5) cray-mpich/8.1.19 7) cray-libsci/22.08.1.1 9) netcdf-mpi/4.8.1 11) parallelio/2.5.9 2) craype/2.7.17 (S) 4) ncarcompilers/0.7.1 6) cmake/3.23.2 8) hdf5-mpi/1.12.2 10) parallel-netcdf/1.12.2 12) esmf/8.4.0b20

Where: S: Module is Sticky, requires --force to unload or purge

jedwards4b commented 1 year ago

I notice that the first libsci_gnu_82.so.5 is a link to /opt/cray/pe/libsci/22.08.1.1/GNU/91/x86_64/lib/libsci_gnu_82.so.5 so the issue seems to be that one has _mp and the other does not. That implies a difference in compiler flags?

jedwards4b commented 1 year ago

I thought maybe it was because the esmf library is compiled with threading enabled while cesm is not. So I tried compiling cesm with threads. This did not solve the problem.

benkirk commented 1 year ago

Are you seeing runtime errors, or warnings? I have seen this as a warning with cray-libsci, but things executed OK so it fell to the back burner

jedwards4b commented 1 year ago

The only lines in my output file are:

gu0001.hsn.gu.hpc.ucar.edu 78: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 82: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 87: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 88: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 113: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 115: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu: rank 78 exited with code 1

So it looks to me like this is causing the crash.

vanderwb commented 1 year ago

Your ESMF suspicion does seem to be accurate, though I don't know why your attempted fix did not resolve things:

file=libsci_gnu_82.so.5 [0];  needed by /glade/gust/scratch/jedwards/SMS.f19_g17.X.gust_gnu.20221024_085509_92swhs/bld/cesm.exe
...
file=libsci_gnu_82_mp.so.5 [0];  needed by /glade/u/apps/cseg/spack/opt/spack/cray-sles15-zen3/gcc-12.1.0/esmf-8.4.0b20-groqjowaawivhdo7hinowpsgxbcexeqn/lib/libesmf.so
benkirk commented 1 year ago

I'm not sure that's the case, I see the same warning lines in e.g. /glade/work/benkirk/codes/petsc_tests/ex19b_GPU.sh.o3455 and the application runs.

I think a partial answer is in 'man intro_libsci' but I've not eliminated on my own side yet, I'll look into this as time allows and see what I can find

jedwards4b commented 1 year ago

If I recompile with DEBUG enabled I still get warnings but the test runs to completion, so maybe this isn't the problem.

jedwards4b commented 1 year ago

I found the problem and it's my own. The optimized esmf library was built without mpi.