ESCOMP / CDEPS

Community Data Models for Earth Prediction Systems
https://escomp.github.io/CDEPS/versions/master/html/index.html
20 stars 45 forks source link

ERP_Ln9_Vnuopc.ne30_ne30_mg17.QPRCEMIP.cheyenne_intel.cam-outfrq9s test failing #182

Closed peverwhee closed 2 years ago

peverwhee commented 2 years ago

Description

@fischer-ncar came across this prebeta test failure (https://github.com/ESCOMP/CAM/issues/639). The test runs successfully with MCT.

When the test is run as in the description (no debug), the error described in https://github.com/ESCOMP/CAM/issues/639 occurs within CLUBB.

When the test is run with debug (ERP_Ln9_Vnuopc.ne30_ne30_mg17.QPRCEMIP.cheyenne_intel.cam-outfrq9s), a different (presumably earlier?) error occurs in CMEPS:

 MPI_SGI_stacktraceback (
3:MPT:     header=header@entry=0x7ffd7ce59d10 "MPT ERROR: Rank 3(g:3) received signal SIGFPE(8).\n\tProcess ID: 54232, Host: r11i7n7, Program: /glade/scratch/courtneyp/ERP_D_Ln9_Vnuopc.ne30_ne30_mg17.QPRCEMIP.cheyenne_intel.cam-outfrq9s.20220823_212"...) at sig.c:340
3:MPT: #3  0x00002ab70dcd84ff in first_arriver_handler (signo=signo@entry=8,
3:MPT:     stack_trace_sem=stack_trace_sem@entry=0x2ab71d5e0080) at sig.c:489
3:MPT: #4  0x00002ab70dcd8793 in slave_sig_handler (signo=8, siginfo=<optimized out>,
3:MPT:     extra=<optimized out>) at sig.c:565
3:MPT: #5  <signal handler called>
3:MPT: #6  0x000000000099d525 in shr_flux_mod::flux_atmocn (logunit=6, nmax=338,
3:MPT:     zbot=..., ubot=..., vbot=..., thbot=..., qbot=..., s16o=..., shdo=...,
3:MPT:     s18o=..., rbot=..., tbot=..., us=..., vs=..., ts=..., mask=...,
3:MPT:     seq_flux_atmocn_minwind=0.5, sen=..., lat=..., lwup=..., r16o=...,
3:MPT:     rhdo=..., r18o=..., evap=..., evap_16o=..., evap_hdo=..., evap_18o=...,
3:MPT:     taux=..., tauy=..., tref=..., qref=..., ocn_surface_flux_scheme=0,
3:MPT:     duu10n=..., ustar_sv=..., re_sv=..., ssq_sv=..., missval=0)
3:MPT:     at /glade/u/home/courtneyp/Projects/CAM_2/cime/../components/cmeps/cime_config/../cesm/flux_atmocn/shr_flux_mod.F90:330
3:MPT: #7  0x000000000086982d in med_phases_aofluxes_mod::med_aofluxes_update (
3:MPT:     gcomp=..., aoflux_in=..., aoflux_out=..., rc=0)
3:MPT:     at /glade/u/home/courtneyp/Projects/CAM_2/cime/../components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90:930
3:MPT: #8  0x00000000008589e9 in med_phases_aofluxes_mod::med_phases_aofluxes_run (
3:MPT:     gcomp=..., rc=0)
3:MPT:     at /glade/u/home/courtneyp/Projects/CAM_2/cime/../components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90:301
3:MPT: #9  0x00002ab707a54432 in ESMCI::MethodElement::execute(void*, int*) const ()

The above error can be found in the CESM log here: /glade/scratch/courtneyp/ERP_D_Ln9_Vnuopc.ne30_ne30_mg17.QPRCEMIP.cheyenne_intel.cam-outfrq9s.20220823_104210_yxlmff

Additional Info

From what I can tell, the (or, at least, one) issue is that the ocean temperature in the mediator is invalid (-1).

mvertens commented 2 years ago

@peverwhee - I have a fix for this in the PR I just issued to CDEPS. I have verified that with this simple fix - this test now works.