Closed jedwards4b closed 1 year ago
Also seeing this issue with nvhpc.
I am ldd'ing cesm: /glade/gust/scratch/jedwards/SMS.f19_g17.X.gust_gnu.20221024_085509_92swhs/bld/cesm.exe
Build log is in: /glade/gust/scratch/jedwards/SMS.f19_g17.X.gust_gnu.20221024_085509_92swhs/bld/cesm.bldlog.221024-085700.gz
Currently Loaded Modules: 1) ncarenv/22.10 (S) 3) gcc/12.1.0 5) cray-mpich/8.1.19 7) cray-libsci/22.08.1.1 9) netcdf-mpi/4.8.1 11) parallelio/2.5.9 2) craype/2.7.17 (S) 4) ncarcompilers/0.7.1 6) cmake/3.23.2 8) hdf5-mpi/1.12.2 10) parallel-netcdf/1.12.2 12) esmf/8.4.0b20
Where: S: Module is Sticky, requires --force to unload or purge
I notice that the first libsci_gnu_82.so.5 is a link to /opt/cray/pe/libsci/22.08.1.1/GNU/91/x86_64/lib/libsci_gnu_82.so.5 so the issue seems to be that one has _mp and the other does not. That implies a difference in compiler flags?
I thought maybe it was because the esmf library is compiled with threading enabled while cesm is not. So I tried compiling cesm with threads. This did not solve the problem.
Are you seeing runtime errors, or warnings? I have seen this as a warning with cray-libsci, but things executed OK so it fell to the back burner
The only lines in my output file are:
gu0001.hsn.gu.hpc.ucar.edu 78: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 82: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 87: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 88: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 113: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu 115: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
gu0001.hsn.gu.hpc.ucar.edu: rank 78 exited with code 1
So it looks to me like this is causing the crash.
Your ESMF suspicion does seem to be accurate, though I don't know why your attempted fix did not resolve things:
file=libsci_gnu_82.so.5 [0]; needed by /glade/gust/scratch/jedwards/SMS.f19_g17.X.gust_gnu.20221024_085509_92swhs/bld/cesm.exe
...
file=libsci_gnu_82_mp.so.5 [0]; needed by /glade/u/apps/cseg/spack/opt/spack/cray-sles15-zen3/gcc-12.1.0/esmf-8.4.0b20-groqjowaawivhdo7hinowpsgxbcexeqn/lib/libesmf.so
I'm not sure that's the case, I see the same warning lines in e.g.
/glade/work/benkirk/codes/petsc_tests/ex19b_GPU.sh.o3455
and the application runs.
I think a partial answer is in 'man intro_libsci' but I've not eliminated on my own side yet, I'll look into this as time allows and see what I can find
If I recompile with DEBUG enabled I still get warnings but the test runs to completion, so maybe this isn't the problem.
I found the problem and it's my own. The optimized esmf library was built without mpi.
I am getting a runtime error when compiling cesm with gcc on gust: [CRAYBLAS_WARNING] Application linked against multiple cray-libsci libraries
Using ldd I see symbols libsci_gnu_82.so.5 => /opt/cray/pe/lib64/libsci_gnu_82.so.5 (0x00007f3dd182d000) libsci_gnu_82_mp.so.5 => /opt/cray/pe/libsci/22.08.1.1/gnu/9.1/x86_64/lib/libsci_gnu_82_mp.so.5 (0x00007f3dc95c8000)
How do I determine what provided these symbols so that I can resolve the problem?