GEOS-ESM / GEOSgcm_GridComp

Repository containing the physics and IAU code for the GEOS Earth System Model
Apache License 2.0
9 stars 7 forks source link

Certain timesteps causes the model to crash with gfortran/openmpi on scu15 #850

Open bena-nasa opened 1 year ago

bena-nasa commented 1 year ago

While tracking this issue: https://github.com/GEOS-ESM/GEOSgcm_GridComp/issues/847

I ran into something else that seems to warrant it's own issue.

When you choose the single moment physics (in this case at c90) the default tilmestep is 450 s, but if you choose 2 moment physics the default timestep is 1800 s.

I am finding that when you run the model using the same version specified in #847 with that longer timestep the model just crashes in the dynamics with a segmentation fault in the first timestep when using gfortran and openmpi on scu17. This happens with both the release and debug builds with gfortran and is independent of which microphysics you have chosen. Use the shorter timestep of 450 s and the models runs with gfortran. Unfortunately I'm getting not much useful traceback:

 TR::e90
 TR::Rn222
 TR::CH3I
 Real*8 Resource Parameter: PSDRY:98305.000000, (default value)
 Global Area=   510064471910262.25
[borga169:30008:0:30008] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe4)
[borga169:30009:0:30009] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe4)
[borga169:30006:0:30006] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe4)
==== backtrace (tid:  30008) ====
 0  /usr/lib64/libucs.so.0(ucs_handle_error+0xe4) [0x2abcc7d51da4]
 1  /usr/lib64/libucs.so.0(+0x2210c) [0x2abcc7d5210c]
 2  /usr/lib64/libucs.so.0(+0x222c2) [0x2abcc7d522c2]
 3  /lib64/libpthread.so.0(+0x11ce0) [0x2abca1941ce0]
 4  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_pml_ob1.so(+0x1852c) [0x2abccc22452c]
 5  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_pml_ob1.so(+0x1ad2c) [0x2abccc226d2c]
 6  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x7f) [0x2abcc623396f]
 7  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/openmpi/mca_btl_vader.so(+0x4def) [0x2abcc6233def]
 8  /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libopen-pal.so.40(opal_progress+0x2c) [0x2abcbba9b16c]
 9  /gpfsm/dswdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libmpi.so.40(ompi_request_default_wait+0x45) [0x2abcbaa88c75]
10  /gpfsm/dswdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libmpi.so.40(PMPI_Wait+0x52) [0x2abcbaacc2c2]
11  /gpfsm/dswdev/gmao_SIteam/MPI/openmpi/4.1.3/gcc-12.1.0/lib/libmpi_mpifh.so.40(mpi_wait+0x31) [0x2abcba820a51]
12  /gpfsm/dswdev/bmauer/models/geosgcm_moistbug/GEOSgcm/install-debug-gfortran/bin/../lib/libfms_r8.so(__mpp_mod_MOD_mpp_sync_self+0x101a) [0x2abcb2ba709c]
13  /gpfsm/dswdev/bmauer/models/geosgcm_moistbug/GEOSgcm/install-debug-gfortran/bin/../lib/libfms_r8.so(__mpp_domains_mod_MOD_mpp_complete_group_update_r4+0x62bb) [0x2abcb2ea8ff0]
mathomp4 commented 1 year ago

Hmm. That shows a possible something.

Ben, if you have a chance, can you try a build with -DFV_PRECISION=R4. That would probably crush poor MOM6, but I wonder if having the double r4+r8 FMS is causing issues.