mathomp4 commented 3 years ago

This was discovered by @sdrabenh as he was looking at the nonhydrostatic (NH) GEOS model. Turns out if you build GEOS for non-hydrostatic dynamics, the model does not start-stop regress. This usually points to an internal state issue. The question is: where?

I suppose there are two possible culprits: Dynamics and Moist. Maybe perhaps DZ and W are not being carried around correctly in Dyn? Or perhaps Moist runs differently in non-hydro?

I'm adding up the usual suspects (@atrayano, @bena-nasa, @wmputman) to this issue for opinions and thoughts.

mathomp4 commented 3 years ago

For ease of testing, I have created a branch here on the fixture that grabs all of my PRs for this:

feature/mathomp4/nonhydro-gcm

To build it, you either use ./parallel_build.csh -nonhydrostatic or you add:

-DHYDROSTATIC=NO

to the cmake command.

Then make sure you are correctly selecting nonhydrostatic in gcm_setup. One of my PR in this branch defaults it to whatever the -DHYDROSTATIC was passed in as.

ETA: The differences between this and main are:

diff --git a/components.yaml b/components.yaml
index 8ea3fd0..5cab83b 100644
--- a/components.yaml
+++ b/components.yaml
@@ -5,7 +5,7 @@ GEOSgcm:
 env:
   local: ./@env
   remote: ../ESMA_env.git
-  tag: v3.1.3
+  branch: feature/mathomp4/update-pbuild-for-hydrostatic
   develop: main

 cmake:
@@ -55,13 +55,13 @@ GEOSgcm_GridComp:
 FVdycoreCubed_GridComp:
   local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp
   remote: ../FVdycoreCubed_GridComp.git
-  tag: v1.2.10
+  branch: feature/mathomp4/update-setup-for-hydrostatic-fv3
   develop: develop

 fvdycore:
   local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp/@fvdycore
   remote: ../GFDL_atmos_cubed_sphere.git
-  tag: geos/v1.1.4
+  tag: geos/v1.1.5
   develop: geos/develop

 GEOSchem_GridComp:
@@ -93,7 +93,7 @@ mom6:
 GEOSgcm_App:
   local: ./src/Applications/@GEOSgcm_App
   remote: ../GEOSgcm_App.git
-  tag: v1.3.15
+  branch: feature/mathomp4/update-setup-for-hydrostatic
   develop: develop

 UMD_Etc:

mathomp4 commented 3 years ago

As an additional clue, @sdrabenh said that he saw an issue with the zero-increment replay (which I think means you turn on regular replay, but set REPLAY_T, REPLAY_Q, et al, to NO. @sdrabenh can you tell us what test you did that saw this issue? Might help point us to the issue.

mathomp4 commented 3 years ago

Note: I saw start-stop failure with a "default" experiment. I was only doing NH at C24 and nothing else exciting. (That is, no GFDL microphysics, etc.)

sdrabenh commented 3 years ago

As an additional clue, @sdrabenh said that he saw an issue with the zero-increment replay (which I think means you turn on regular replay, but set REPLAY_T, REPLAY_Q, et al, to NO. @sdrabenh can you tell us what test you did that saw this issue? Might help point us to the issue.

Correct, I ran the default 1MOM L72 C48 nonhydrostatic build for 1 day and it failed gcm_regress.j and the 0-increment test @mathomp4 mentioned which should produce results identical to AMIP mode. Furthermore, we are seeing similar issues in the GF2020 development branches. Since we haven't typically tested the NH model, the question is whether this is a new or old problem?

mathomp4 commented 3 years ago

It looks like we never get the DZ or W internal state pointers in DynCore_GridCompMod.F90:

    call MAPL_GetPointer(INTERNAL, AK, 'AK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL, BK, 'BK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL,UD,'U'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,VD,'V'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PE,'PE' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PT,'PT' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PK,'PKZ',RC=STATUS)

Maybe we should if nonhydro?

mathomp4 commented 3 years ago

Correct, I ran the default 1MOM L72 C48 nonhydrostatic build for 1 day and it failed gcm_regress.j and the 0-increment test @mathomp4 mentioned which should produce results identical to AMIP mode. Furthermore, we are seeing similar issues in the GF2020 development branches. Since we haven't typically tested the NH model, the question is whether this is a new or old problem?

Ah yes. AMIP and Replay-NoInc should be identical. Thanks. I knew there was a test for that!

sdrabenh commented 3 years ago

It looks like we never get the DZ or W internal state pointers in DynCore_GridCompMod.F90:

    call MAPL_GetPointer(INTERNAL, AK, 'AK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL, BK, 'BK', RC=STATUS)
    call MAPL_GetPointer(INTERNAL,UD,'U'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,VD,'V'  ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PE,'PE' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PT,'PT' ,RC=STATUS)
    call MAPL_GetPointer(INTERNAL,PK,'PKZ',RC=STATUS)

Maybe we should if nonhydro?

Good question perhaps @wmputman can comment

mathomp4 commented 3 years ago

I forgot about FV_StateMod.F90. We do get the DZ and W internal state there. Maybe that file matters more...

bena-nasa commented 3 years ago

I'll investigate the restart

sdrabenh commented 3 years ago

Another clue, in a separate test built with the feature/wmputman/DevDYAMONDv2_Merge04nhGF2020evap3 branches the GFDL NH model does regress. Could this be in issue with the 1 MOM microphysics? Unfortunately, that failed the 0-increment replay test.

mathomp4 commented 3 years ago

I just tried GFDL with the "stock" GCM and it doesn't regress. So I guess GFDL is different in that branch!

bena-nasa commented 3 years ago

I think DZ and W are being written ok in the restart, the pointers are filled in in FV_StateMod.F90 so I'm not sure what is going on. If you do a zero-length run you get identical restarts so. I can confirm the start/stop failure. I'm wondering if the standalone FV3 would show this. Could we be missing a variable? This is bizarre.

wmputman commented 3 years ago

I think start/stop NH is due to the use of make_nh in the fv3 nml. You cannot have Make_NH = .T. for both segments, just the first segment

wmputman commented 3 years ago

Alternatively, if you regrid restarts, or make all the DZ values = 0.0, FV3 with make_nh automatically without the nml flag

mathomp4 commented 3 years ago

I think start/stop NH is due to the use of make_nh in the fv3 nml. You cannot have Make_NH = .T. for both segments, just the first segment

@wmputman Oooh. Okay. I'll work on testing this as I think I can easily make a sed rule for the second segment.

But, it does look like my (newly-regridded) restarts have all-0 W which this code bit from FV_StateMod seems to care about:

1575   │     if (fv_first_run) then
1576   │      ! Make_NH
1577   │       if ( .not. FV_Atm(1)%flagstruct%hydrostatic ) then
1578   │         if (all(FV_Atm(1)%w(isc:iec,jsc:jec,:) == 0.0)) FV_Atm(1)%flagstruct%Make_NH = .true.
1579   │         if ( FV_Atm(1)%flagstruct%Make_NH ) then
1580   │           if (mpp_pe()==0) print*, 'fv_first_run: FV3 is making Non-Hydrostatic W and DZ'
1581   │           call p_var(FV_Atm(1)%npz,         isc,         iec,       jsc,     jec,  FV_Atm(1)%ptop,     ptop_min,  &
1582   │                      FV_Atm(1)%delp, FV_Atm(1)%delz, FV_Atm(1)%pt, FV_Atm(1)%ps, FV_Atm(1)%pe,  FV_Atm(1)%peln,   &
1583   │                      FV_Atm(1)%pk,   FV_Atm(1)%pkz, kappa, FV_Atm(1)%q, FV_Atm(1)%ng, &
1584   │                      FV_Atm(1)%ncnst, FV_Atm(1)%gridstruct%area_64, FV_Atm(1)%flagstruct%dry_mass,  &
1585   │                      FV_Atm(1)%flagstruct%adjust_dry_mass,  FV_Atm(1)%flagstruct%mountain, &
1586   │                      FV_Atm(1)%flagstruct%moist_phys,  FV_Atm(1)%flagstruct%hydrostatic, &
1587   │                      FV_Atm(1)%flagstruct%nwat, FV_Atm(1)%domain, FV_Atm(1)%flagstruct%make_nh)
1588   │           FV_Atm(1)%flagstruct%Make_NH=.false.
1589   │         endif
1590   │       endif
1591   │      ! Mark FV setup complete
1592   │       fv_first_run = .false.
1593   │     endif

I guess the question is: should we have some sort of "test" in gcm_run.j that looks to see if W in fvcore_internal_rst is all zero and sets make_nh to false if so?

Or, perhaps should we do this:

Compiled with HYDROSTATIC=YES, selects run NH in gcm_setup: make_nh: .T.
Compiled with HYDROSTATIC=NO, selects run NH in gcm_setup: make_nh: .F.

A user could always set make_nh to whatever they want, but since regrid.pl seems to make all-0 W restarts, I don't think it's unreasonable that if you build for nonhydrostatic and run for nonhydrostatic, you'll have recently regridded restarts?

wmputman commented 3 years ago

@mathomp4: yes, having that variable in the nml as .F. is fine for me. And then a user can change it if they want to (at the risk of causing troubles for long runs when they forget to remove it).

bena-nasa commented 3 years ago

I take it back, this does not fix it. If I set make_nh .F. for the 2nd segment it still fails regression. Now the very weird thing, I whim I tried this: I uncommented in the AGCM.rc

USE_AEROSOL_NN: 0

It does pass regression, except for the gocart and irrad restarts as long as make_nh is false for the 2nd segment.

mathomp4 commented 3 years ago

Okay. I'll work on getting the logic right in gcm_setup, et al. Give me a bit...

mathomp4 commented 3 years ago

Okay, I pushed updates to gcm_setup for the make_nh bits.

mathomp4 commented 3 years ago

Also: weirdly, regression did work for me with my make_nh fixes in gcm_setup. I'm going to make a new clone with the branches I indicated above and be sure I don't have some non-committed fix.

wmputman commented 3 years ago

USE_AEROSOL_NN needs to change in concert with two GFDL-MP nml values: prog_ccn and use_ccn

USE_AEROSOL_NN: 0 prog_ccn: .F. use_ccn: .T.

USE_AEROSOL_NN: 1 prog_ccn: .T. use_ccn: .F.

This is only relevant for GFDL-MP, for 1MOM only USE_AEROSOL_NN matters

mathomp4 commented 3 years ago

Hmm. I was running 1MOM, and I'll try that first.

Also: I guess we should put a comment in the AGCM.rc telling folks to change those in concert?

mathomp4 commented 3 years ago

@wmputman Hmm. A question. Currently in Moist, USE_AEROSOL_NN defaults to 1, which is why you have to uncomment out the line in AGCM.rc to get it to 0.

But the default values in fvcore_layout.rc in GEOSgcm_App are:

40:  prog_ccn = .false.
64:  use_ccn = .true.

which matches your USE_AEROSOL_NN: 0 settings.

Should we change the defaults in fvcore_layout.rc to match the default in Moist? Or should GFDL be running with USE_AEROSOL_NN set to 0 by default if someone chooses it?

mathomp4 commented 3 years ago

Some testing. After I pushed the make_nh fix for gcm_setup:

Run	Regress
AMIP NH	PASS
Replay NH	PASS
ReplayNoINC NH	PASS

So that is good!

Now for the bad. As @sdrabenh reported, with NH dynamics, AMIP is not identical to Replay-NoIncrement! They are identical for hydrostatic dynamics, so I see no reason they shouldn't be for NH.

mathomp4 commented 3 years ago

I wonder, do we need a REPLAY_W: now for NH? Or maybe all those DUDT bits in AGCM/mkiau need to have a DWDT analogue? (This comment brought to you by "Matt doesn't really know how Replay works" aka "Matt asks @lltakacs or @atrayano when he has a replay question." 😄 )

wmputman commented 3 years ago

I'll have to look at add_incs for NH...

wmputman commented 3 years ago

There are never any W increments applied to FV3 (for now...) and DZ/PKZ just get adjusted based on T-Increments. If they are 0.0 there should be no change.

   if (.not. HYDROSTATIC ) then
      ! remove old T from DZ
      STATE%VARS%DZ = STATE%VARS%DZ / STATE%VARS%PT

      ! Update T
      STATE%VARS%PT =  STATE%VARS%PT                         *DPOLD
      STATE%VARS%PT = (STATE%VARS%PT + DT*TEND*(MAPL_CP/CVM))/DPNEW

      ! update DZ with new T
      STATE%VARS%DZ = STATE%VARS%DZ * STATE%VARS%PT
   else
      ! Update T
      STATE%VARS%PT =  STATE%VARS%PT                         *DPOLD
      STATE%VARS%PT = (STATE%VARS%PT + DT*TEND*(MAPL_CP/CVM))/DPNEW
   endif

And then PKZ is just recalculated if getPKZ:

if ( .not.hydrostatic ) then

!$omp parallel do default(shared) do k=1,npz do j=jsc,jec do i=isc,iec ! perfect gas law: p = density rdgas virtual_temperature ! pkz(i,j,k) = ( rdgdelp(i,j,k)pt(i,j,k)/delz(i,j,k) )*kappa pkz(i,j,k) = exp( kappalog(rdgdelp(i,j,k)temp(i,j,k) & (1.d0+zvirqv(i,j,k))/delz(i,j,k)) ) enddo enddo enddo else !$omp parallel do default(shared) do k=1,npz do j=jsc,jec do i=isc,iec pkz(i,j,k) = (pk(i,j,k+1)-pk(i,j,k)) / & (kappa*(peln(i,j,k+1)-peln(i,j,k))) enddo enddo enddo endif

GEOS-ESM / GEOSgcm

GEOSgcm NH Replay-NoIncrement is not identical to AMIP #275

USE_AEROSOL_NN: 0