ESCOMP / POP2-CESM

Parallel Ocean Program (POP2) in CESM
http://www.cesm.ucar.edu/models/cesm2/ocean/
4 stars 24 forks source link

Current CESM test failures #76

Open mnlevy1981 opened 1 year ago

mnlevy1981 commented 1 year ago

(given the impending move from POP -> MOM6, I don't expect to fix these; opening an issue ticket in case I get asked about testing in the future)

Description of the issue:

Some tests are failing on cheyenne with gfortran and DEBUG=TRUE (but not all tests in that configuration). With cesm2_3_beta12 the only test that fails is

SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

I updated from MARBL from marbl0.40.3 to marbl0.41.0 (which required small POP changes as well) and two tests failed:

ERS_Ld5_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_box_atm_co2
SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

Moving to marbl0.42.0 (also making minor changes to POP) had a slightly different pair of failed tests

SMS_Ld2_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ciso_daily_r4_tavg
SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

And moving to the version of MARBL in https://github.com/marbl-ecosys/MARBL/pull/423 was the same

SMS_Ld2_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ciso_daily_r4_tavg
SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

The traceback for each failed test is the same, pointing at something in the tidal mixing module:

51:
51:Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
51:
51:Backtrace for this error:
51:#0  0x2ad3855b7bff in ???
51:#1  0x67c993 in __tidal_mixing_MOD_init_tidal_mixing1
51:     at $EXEROOT/ocn/source/tidal_mixing.F90:919
51:#2  0x8e1191 in __initial_MOD_pop_init_phase1
51:     at $EXEROOT/ocn/source/initial.F90:386
51:#3  0x586e09 in initializerealize
51:     at $EXEROOT/ocn/source/ocn_comp_nuopc.F90:389
51:#4  0x2ad38001705b in _ZN5ESMCI6FTable12callVFuncPtrEPKcPNS_2VMEPi
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Superstructure/Component/src/ESMCI_FTable.C:2167
51:#5  0x2ad380014198 in ESMCI_FTableCallEntryPointVMHop
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Superstructure/Component/src/ESMCI_FTable.C:824
51:#6  0x2ad3803e7250 in _ZN5ESMCI3VMK5enterEPNS_7VMKPlanEPvS3_
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Infrastructure/VM/src/ESMCI_VMKernel.C:2320
51:#7  0x2ad38040150c in _ZN5ESMCI2VM5enterEPNS_6VMPlanEPvS3_
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Infrastructure/VM/src/ESMCI_VM.C:1216

Running the same test on izumi, however, tells a different story

Runtime Error: *** Arithmetic exception: Floating divide by zero Runtime Error: - aborting
$SRCROOT/components/cmeps/cime_config/../cesm/flux_atmocn/shr_flux_mod.F90, line 331: Error occurred in SHR_FLUX_MOD:FLUX_ATMOCN
$SRCROOT/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90, line 1047: Called by MED_PHASES_AOFLUXES_MOD:MED_AOFLUXES_UPDATE
$SRCROOT/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90, line 315: Called by MED_PHASES_AOFLUXES_MOD:MED_PHASES_AOFLUXES_RUN
$SRCROOT/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 141: Called by ESMAPP
[i039.cgd.ucar.edu:mpi_rank_39][error_sighandler] Caught error: Aborted (signal 6)

Version:

Machine/Environment Description:

cheyenne (gfortran) and izumi (nag)

Any xml/namelist changes or SourceMods:

no