ESCOMP / CAM

Community Atmosphere Model
77 stars 140 forks source link

Test SMS_Ld2.ne30pg3_t232.BMT1850.derecho_gnu.allactive-defaultio Fails #1116

Closed jedwards4b closed 1 month ago

jedwards4b commented 2 months ago

What happened?

20240725 160423.008 ERROR PET5344 /glade/u/home/fischer/code/cesm3_0_alpha02b/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2628 ERROR: 4 nans found in Sa_pslv 20240725 160423.008 ERROR PET5344 /glade/u/home/fischer/code/cesm3_0_alpha02b/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2633 ABORTING JOB

What are the steps to reproduce the bug?

Run the test

What CAM tag were you using?

cam6_4_016

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

GNU

Path to a case directory, if applicable

No response

Will you be addressing this bug yourself?

Yes

Extra info

I'll try in debug mode and see if I get any more info. Note that this test works with intel.

PeterHjortLauritzen commented 2 months ago

Some diagnostics: The surface pressure (Sa_pslv) passed to the coupler from the atmosphere has nan's in it. The pressure values are reasonable at initialization (after reading in the initial condition file) but the model fails before completing a time-step (i.e. 1st call to coupler).

jedwards4b commented 2 months ago

I think that maybe the psl field is not being initialized before the first call to atm_import_export.F90 I wonder if the solution might be to just skip the nan check on the first step?

PeterHjortLauritzen commented 2 months ago

sea-level pressure is computed in cam_diagnostics

    ! Sea level pressure
    call pbuf_get_field(pbuf, psl_idx, psl)
    call cpslec(ncol, state%pmid, state%phis, state%ps, state%t, psl, gravit, rair)
    call outfld('PSL', psl, pcols, lchnk)

This code is executed from physpkg in tphys_bc

call diag_phys_writeout(state, pbuf)

before the call that populates cam%out variables for the coupler

call cam_export (state,cam_out,pbuf)

Long story short; PSL should not have nans in it.

Could it be that PSL(ncol+1:pcols) is not initialized to zero? (I doubt that is an issue!)

@brian-eaton From the ChangeLog it looks like PSL was moved to the physics_buffer back in 2018

Tag name: cam6_0_032
Originator(s): eaton
Date: Wed Oct 31 17:14:58 MDT 2018
One-line Summary: cleanup in diagnostic output and simple model code

Do you have an idea of what is going on?

brian-eaton commented 2 months ago

@jedwards4b, @PeterHjortLauritzen. During CAM's initialization cam_init calls atm2hub_alloc which sets cam_out%psl to zero. This happens before the first call to export_fields. During the run, the sequence that Peter describes should be setting cam_out%psl to valid values. So Sa_pslv should never contain nans.

Do we know whether the nans are trapped during initialization rather than during the run phase?

Is is possible that regridding to t232 is causing the problem?

jedwards4b commented 2 months ago

It's working fine for the intel compiler. There is a newer version of the gnu compiler available, I'm testing that now. This was found with gcc/12.2.0 I am now testing gcc/13.2.0

jedwards4b commented 2 months ago

Same error with gcc/13.2.0 /glade/derecho/scratch/jedwards/SMS_Ld2.ne30pg3_t232.BMT1850.derecho_gnu.allactive-defaultio.20240807_154234_oazzge

brian-eaton commented 2 months ago

It turns out that @PeterHjortLauritzen and I made the same mistake in concluding that psl was being set in the pbuf before the call to cam_export which copies it from the pbuf to the cam_out export state. The problem is that in the cam7 physics reordering the call to diag_phys_writeout was moved from tphysbc to tphysac so that it remains after the moist adjustment processes. Consequently the first call to cam_export in tphysbc is copying from a pbuf location that hasn't been set by a call to cpslec.

In the cam6 ordering the values of psl sent to the coupler are the same as what is written to the history file. In the new ordering we need to make a choice. Either we can always compute psl before the call to cam_export so it is consistent with the rest of the state sent to the coupler, or we can leave it where it is so it will remain consistent with the state output to history files, but the values going to the coupler will lag by a timestep. In the latter case we would also need to add an extra calculation of psl to the "if first timestep" conditional in tphysbc (like the first call to the radiation calcs).

Peter, @adamrher , what do you think? I'm guessing it's best to leave it where it is since it would appear to mainly be a cam diagnostic field. I'm not sure why other components want to know the value of the surface pressure extrapolated to sea level...

jedwards4b commented 2 months ago

The MOM ocn model wants it - but it does not use the first timestep.

adamrher commented 2 months ago

@brian-eaton I think we can initilaize psl using (if first timestep) logic in the init? I'm looking at how the only other 3 pbuf vars in cam_diagnostics.F90 are dealt with. See t_u/v/ttend in diag_conv_tend_ini.

My preference is to preserve the relative ordering of all the routines in the original (not cam7) physpkg.F90. That requires diag_phys_writeout to be called just after the fast moist adjustment, and before the history output subroutine in tphysac. The model state should be diagnosed at that point in the time loop and accumulated in the history output.

However, looking at the code, I guess it never occurred to me that psl was set on the physics side and that it doesn't reside in the state% array. There's no good reason we can't give the CPL the correct psl state. I'm wondering if we should make the psl calculation its own subroutine in cam_diagnostics.F90, and call it from tphysbc before cam_export.

adamrher commented 2 months ago

OK I've made that way too complicated. I'm advocating we move the call to cpslec out of cam_diagnostics.F90 and into tphysbc before the call to cam_export. That should work, right?

brian-eaton commented 2 months ago

If we move the call to cpslec out of diag_phys_writeout and into tphysbc before the call to cam_export, the coupler will get values of psl that are consistent with the primitive state, but the values in the history file will no longer be consistent with the other state variables written from diag_phys_writeout after the moist adjustment physics. I'm thinking it would be better to add a second calculation of psl before the call to cam_export, and leave the current calculation in diag_phys_writeout alone. That way the values of psl are always consistent with the current state variables. This is how we treat other derived variables, for example the tropopause calculation is done in multiple spots to be consistent with the current state primitive variables. Sound OK?

adamrher commented 2 months ago

Good point, the pslcalculation is dependent on state%t(:,pver) which varies throughout the time loop. I would agree with keeping the cpslec in diag_phys_writeout and just putting an additional call to cpslec in tphysbc in src/physics/cam7/physpkg.F90.

I don't know if this will create greater than round off differences, so I should probably do a validation run with this mod (the ensemble consistency test would be helpful here!).

brian-eaton commented 2 months ago

Jim mentioned that MOM uses psl. I don't know about other components, but it's possible that F compsets could be BFB. I'll run the regression tests and let you know.

brian-eaton commented 2 months ago

The regression tests confirm that F compsets using cam7 are BFB with this change. The impact on the B compsets using cam7 will be due to removing the one timestep lag that was present in the psl values being sent to the coupler. The lag was the result of diag_phys_writeout being moved from tphysbc to tphysac. I will issue a PR with the fixes discussed in this thread.

adamrher commented 2 months ago

Great, I guess the land model isn't using psl. The changes to the B case are fine by me. @PeterHjortLauritzen let's keep this mod in mind when we create a new B-case baseline.