Closed jedwards4b closed 1 month ago
Some diagnostics: The surface pressure (Sa_pslv) passed to the coupler from the atmosphere has nan's in it. The pressure values are reasonable at initialization (after reading in the initial condition file) but the model fails before completing a time-step (i.e. 1st call to coupler).
I think that maybe the psl field is not being initialized before the first call to atm_import_export.F90 I wonder if the solution might be to just skip the nan check on the first step?
sea-level pressure is computed in cam_diagnostics
! Sea level pressure
call pbuf_get_field(pbuf, psl_idx, psl)
call cpslec(ncol, state%pmid, state%phis, state%ps, state%t, psl, gravit, rair)
call outfld('PSL', psl, pcols, lchnk)
This code is executed from physpkg in tphys_bc
call diag_phys_writeout(state, pbuf)
before the call that populates cam%out variables for the coupler
call cam_export (state,cam_out,pbuf)
Long story short; PSL should not have nans in it.
Could it be that PSL(ncol+1:pcols) is not initialized to zero? (I doubt that is an issue!)
@brian-eaton From the ChangeLog it looks like PSL was moved to the physics_buffer back in 2018
Tag name: cam6_0_032
Originator(s): eaton
Date: Wed Oct 31 17:14:58 MDT 2018
One-line Summary: cleanup in diagnostic output and simple model code
Do you have an idea of what is going on?
@jedwards4b, @PeterHjortLauritzen. During CAM's initialization cam_init
calls atm2hub_alloc
which sets cam_out%psl
to zero. This happens before the first call to export_fields
. During the run, the sequence that Peter describes should be setting cam_out%psl
to valid values. So Sa_pslv
should never contain nans.
Do we know whether the nans are trapped during initialization rather than during the run phase?
Is is possible that regridding to t232 is causing the problem?
It's working fine for the intel compiler. There is a newer version of the gnu compiler available, I'm testing that now. This was found with gcc/12.2.0 I am now testing gcc/13.2.0
Same error with gcc/13.2.0 /glade/derecho/scratch/jedwards/SMS_Ld2.ne30pg3_t232.BMT1850.derecho_gnu.allactive-defaultio.20240807_154234_oazzge
It turns out that @PeterHjortLauritzen and I made the same mistake in concluding that psl
was being set in the pbuf
before the call to cam_export
which copies it from the pbuf
to the cam_out
export state. The problem is that in the cam7
physics reordering the call to diag_phys_writeout
was moved from tphysbc
to tphysac
so that it remains after the moist adjustment processes. Consequently the first call to cam_export
in tphysbc
is copying from a pbuf
location that hasn't been set by a call to cpslec
.
In the cam6
ordering the values of psl
sent to the coupler are the same as what is written to the history file. In the new ordering we need to make a choice. Either we can always compute psl
before the call to cam_export
so it is consistent with the rest of the state sent to the coupler, or we can leave it where it is so it will remain consistent with the state output to history files, but the values going to the coupler will lag by a timestep. In the latter case we would also need to add an extra calculation of psl
to the "if first timestep"
conditional in tphysbc
(like the first call to the radiation calcs).
Peter, @adamrher , what do you think? I'm guessing it's best to leave it where it is since it would appear to mainly be a cam diagnostic field. I'm not sure why other components want to know the value of the surface pressure extrapolated to sea level...
The MOM ocn model wants it - but it does not use the first timestep.
@brian-eaton I think we can initilaize psl
using (if first timestep) logic in the init? I'm looking at how the only other 3 pbuf vars in cam_diagnostics.F90 are dealt with. See t_u/v/ttend in diag_conv_tend_ini
.
My preference is to preserve the relative ordering of all the routines in the original (not cam7) physpkg.F90. That requires diag_phys_writeout
to be called just after the fast moist adjustment, and before the history output subroutine in tphysac
. The model state should be diagnosed at that point in the time loop and accumulated in the history output.
However, looking at the code, I guess it never occurred to me that psl
was set on the physics side and that it doesn't reside in the state% array. There's no good reason we can't give the CPL the correct psl state. I'm wondering if we should make the psl calculation its own subroutine in cam_diagnostics.F90, and call it from tphysbc
before cam_export
.
OK I've made that way too complicated. I'm advocating we move the call to cpslec
out of cam_diagnostics.F90 and into tphysbc
before the call to cam_export
. That should work, right?
If we move the call to cpslec
out of diag_phys_writeout
and into tphysbc
before the call to cam_export
, the coupler will get values of psl
that are consistent with the primitive state, but the values in the history file will no longer be consistent with the other state variables written from diag_phys_writeout
after the moist adjustment physics. I'm thinking it would be better to add a second calculation of psl
before the call to cam_export
, and leave the current calculation in diag_phys_writeout
alone. That way the values of psl
are always consistent with the current state variables. This is how we treat other derived variables, for example the tropopause calculation is done in multiple spots to be consistent with the current state primitive variables. Sound OK?
Good point, the psl
calculation is dependent on state%t(:,pver) which varies throughout the time loop. I would agree with keeping the cpslec
in diag_phys_writeout
and just putting an additional call to cpslec
in tphysbc in src/physics/cam7/physpkg.F90.
I don't know if this will create greater than round off differences, so I should probably do a validation run with this mod (the ensemble consistency test would be helpful here!).
Jim mentioned that MOM uses psl
. I don't know about other components, but it's possible that F compsets could be BFB. I'll run the regression tests and let you know.
The regression tests confirm that F compsets using cam7 are BFB with this change. The impact on the B compsets using cam7 will be due to removing the one timestep lag that was present in the psl
values being sent to the coupler. The lag was the result of diag_phys_writeout
being moved from tphysbc
to tphysac
. I will issue a PR with the fixes discussed in this thread.
Great, I guess the land model isn't using psl
. The changes to the B case are fine by me. @PeterHjortLauritzen let's keep this mod in mind when we create a new B-case baseline.
What happened?
20240725 160423.008 ERROR PET5344 /glade/u/home/fischer/code/cesm3_0_alpha02b/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2628 ERROR: 4 nans found in Sa_pslv 20240725 160423.008 ERROR PET5344 /glade/u/home/fischer/code/cesm3_0_alpha02b/components/cmeps/cime_config/../mediator/med_methods_mod.F90:2633 ABORTING JOB
What are the steps to reproduce the bug?
Run the test
What CAM tag were you using?
cam6_4_016
What machine were you running CAM on?
CISL machine (e.g. cheyenne)
What compiler were you using?
GNU
Path to a case directory, if applicable
No response
Will you be addressing this bug yourself?
Yes
Extra info
I'll try in debug mode and see if I get any more info. Note that this test works with intel.