Open mathomp4 opened 3 years ago
For ease of testing, I have created a branch here on the fixture that grabs all of my PRs for this:
feature/mathomp4/nonhydro-gcm
To build it, you either use ./parallel_build.csh -nonhydrostatic
or you add:
-DHYDROSTATIC=NO
to the cmake command.
Then make sure you are correctly selecting nonhydrostatic in gcm_setup
. One of my PR in this branch defaults it to whatever the -DHYDROSTATIC
was passed in as.
ETA: The differences between this and main
are:
diff --git a/components.yaml b/components.yaml
index 8ea3fd0..5cab83b 100644
--- a/components.yaml
+++ b/components.yaml
@@ -5,7 +5,7 @@ GEOSgcm:
env:
local: ./@env
remote: ../ESMA_env.git
- tag: v3.1.3
+ branch: feature/mathomp4/update-pbuild-for-hydrostatic
develop: main
cmake:
@@ -55,13 +55,13 @@ GEOSgcm_GridComp:
FVdycoreCubed_GridComp:
local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp
remote: ../FVdycoreCubed_GridComp.git
- tag: v1.2.10
+ branch: feature/mathomp4/update-setup-for-hydrostatic-fv3
develop: develop
fvdycore:
local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp/@fvdycore
remote: ../GFDL_atmos_cubed_sphere.git
- tag: geos/v1.1.4
+ tag: geos/v1.1.5
develop: geos/develop
GEOSchem_GridComp:
@@ -93,7 +93,7 @@ mom6:
GEOSgcm_App:
local: ./src/Applications/@GEOSgcm_App
remote: ../GEOSgcm_App.git
- tag: v1.3.15
+ branch: feature/mathomp4/update-setup-for-hydrostatic
develop: develop
UMD_Etc:
As an additional clue, @sdrabenh said that he saw an issue with the zero-increment replay (which I think means you turn on regular replay, but set REPLAY_T
, REPLAY_Q
, et al, to NO
. @sdrabenh can you tell us what test you did that saw this issue? Might help point us to the issue.
Note: I saw start-stop failure with a "default" experiment. I was only doing NH at C24 and nothing else exciting. (That is, no GFDL microphysics, etc.)
As an additional clue, @sdrabenh said that he saw an issue with the zero-increment replay (which I think means you turn on regular replay, but set
REPLAY_T
,REPLAY_Q
, et al, toNO
. @sdrabenh can you tell us what test you did that saw this issue? Might help point us to the issue.
Correct, I ran the default 1MOM L72 C48 nonhydrostatic build for 1 day and it failed gcm_regress.j and the 0-increment test @mathomp4 mentioned which should produce results identical to AMIP mode. Furthermore, we are seeing similar issues in the GF2020 development branches. Since we haven't typically tested the NH model, the question is whether this is a new or old problem?
It looks like we never get the DZ
or W
internal state pointers in DynCore_GridCompMod.F90
:
call MAPL_GetPointer(INTERNAL, AK, 'AK', RC=STATUS)
call MAPL_GetPointer(INTERNAL, BK, 'BK', RC=STATUS)
call MAPL_GetPointer(INTERNAL,UD,'U' ,RC=STATUS)
call MAPL_GetPointer(INTERNAL,VD,'V' ,RC=STATUS)
call MAPL_GetPointer(INTERNAL,PE,'PE' ,RC=STATUS)
call MAPL_GetPointer(INTERNAL,PT,'PT' ,RC=STATUS)
call MAPL_GetPointer(INTERNAL,PK,'PKZ',RC=STATUS)
Maybe we should if nonhydro?
Correct, I ran the default 1MOM L72 C48 nonhydrostatic build for 1 day and it failed gcm_regress.j and the 0-increment test @mathomp4 mentioned which should produce results identical to AMIP mode. Furthermore, we are seeing similar issues in the GF2020 development branches. Since we haven't typically tested the NH model, the question is whether this is a new or old problem?
Ah yes. AMIP and Replay-NoInc should be identical. Thanks. I knew there was a test for that!
It looks like we never get the
DZ
orW
internal state pointers inDynCore_GridCompMod.F90
:call MAPL_GetPointer(INTERNAL, AK, 'AK', RC=STATUS) call MAPL_GetPointer(INTERNAL, BK, 'BK', RC=STATUS) call MAPL_GetPointer(INTERNAL,UD,'U' ,RC=STATUS) call MAPL_GetPointer(INTERNAL,VD,'V' ,RC=STATUS) call MAPL_GetPointer(INTERNAL,PE,'PE' ,RC=STATUS) call MAPL_GetPointer(INTERNAL,PT,'PT' ,RC=STATUS) call MAPL_GetPointer(INTERNAL,PK,'PKZ',RC=STATUS)
Maybe we should if nonhydro?
Good question perhaps @wmputman can comment
I forgot about FV_StateMod.F90
. We do get the DZ
and W
internal state there. Maybe that file matters more...
I'll investigate the restart
Another clue, in a separate test built with the feature/wmputman/DevDYAMONDv2_Merge04nhGF2020evap3
branches the GFDL NH model does regress. Could this be in issue with the 1 MOM microphysics? Unfortunately, that failed the 0-increment replay test.
I just tried GFDL with the "stock" GCM and it doesn't regress. So I guess GFDL is different in that branch!
I think DZ and W are being written ok in the restart, the pointers are filled in in FV_StateMod.F90 so I'm not sure what is going on. If you do a zero-length run you get identical restarts so. I can confirm the start/stop failure. I'm wondering if the standalone FV3 would show this. Could we be missing a variable? This is bizarre.
I think start/stop NH is due to the use of make_nh in the fv3 nml. You cannot have Make_NH = .T. for both segments, just the first segment
Alternatively, if you regrid restarts, or make all the DZ values = 0.0, FV3 with make_nh automatically without the nml flag
I think start/stop NH is due to the use of make_nh in the fv3 nml. You cannot have Make_NH = .T. for both segments, just the first segment
@wmputman Oooh. Okay. I'll work on testing this as I think I can easily make a sed rule for the second segment.
But, it does look like my (newly-regridded) restarts have all-0 W
which this code bit from FV_StateMod seems to care about:
1575 │ if (fv_first_run) then
1576 │ ! Make_NH
1577 │ if ( .not. FV_Atm(1)%flagstruct%hydrostatic ) then
1578 │ if (all(FV_Atm(1)%w(isc:iec,jsc:jec,:) == 0.0)) FV_Atm(1)%flagstruct%Make_NH = .true.
1579 │ if ( FV_Atm(1)%flagstruct%Make_NH ) then
1580 │ if (mpp_pe()==0) print*, 'fv_first_run: FV3 is making Non-Hydrostatic W and DZ'
1581 │ call p_var(FV_Atm(1)%npz, isc, iec, jsc, jec, FV_Atm(1)%ptop, ptop_min, &
1582 │ FV_Atm(1)%delp, FV_Atm(1)%delz, FV_Atm(1)%pt, FV_Atm(1)%ps, FV_Atm(1)%pe, FV_Atm(1)%peln, &
1583 │ FV_Atm(1)%pk, FV_Atm(1)%pkz, kappa, FV_Atm(1)%q, FV_Atm(1)%ng, &
1584 │ FV_Atm(1)%ncnst, FV_Atm(1)%gridstruct%area_64, FV_Atm(1)%flagstruct%dry_mass, &
1585 │ FV_Atm(1)%flagstruct%adjust_dry_mass, FV_Atm(1)%flagstruct%mountain, &
1586 │ FV_Atm(1)%flagstruct%moist_phys, FV_Atm(1)%flagstruct%hydrostatic, &
1587 │ FV_Atm(1)%flagstruct%nwat, FV_Atm(1)%domain, FV_Atm(1)%flagstruct%make_nh)
1588 │ FV_Atm(1)%flagstruct%Make_NH=.false.
1589 │ endif
1590 │ endif
1591 │ ! Mark FV setup complete
1592 │ fv_first_run = .false.
1593 │ endif
I guess the question is: should we have some sort of "test" in gcm_run.j
that looks to see if W
in fvcore_internal_rst
is all zero and sets make_nh
to false if so?
Or, perhaps should we do this:
HYDROSTATIC=YES
, selects run NH in gcm_setup
: make_nh: .T.
HYDROSTATIC=NO
, selects run NH in gcm_setup
: make_nh: .F.
A user could always set make_nh
to whatever they want, but since regrid.pl
seems to make all-0 W
restarts, I don't think it's unreasonable that if you build for nonhydrostatic and run for nonhydrostatic, you'll have recently regridded restarts?
@mathomp4: yes, having that variable in the nml as .F. is fine for me. And then a user can change it if they want to (at the risk of causing troubles for long runs when they forget to remove it).
I take it back, this does not fix it. If I set make_nh .F. for the 2nd segment it still fails regression. Now the very weird thing, I whim I tried this: I uncommented in the AGCM.rc
It does pass regression, except for the gocart and irrad restarts as long as make_nh is false for the 2nd segment.
Okay. I'll work on getting the logic right in gcm_setup
, et al. Give me a bit...
Okay, I pushed updates to gcm_setup
for the make_nh
bits.
Also: weirdly, regression did work for me with my make_nh
fixes in gcm_setup
. I'm going to make a new clone with the branches I indicated above and be sure I don't have some non-committed fix.
USE_AEROSOL_NN needs to change in concert with two GFDL-MP nml values: prog_ccn and use_ccn
USE_AEROSOL_NN: 0 prog_ccn: .F. use_ccn: .T.
USE_AEROSOL_NN: 1 prog_ccn: .T. use_ccn: .F.
This is only relevant for GFDL-MP, for 1MOM only USE_AEROSOL_NN matters
Hmm. I was running 1MOM, and I'll try that first.
Also: I guess we should put a comment in the AGCM.rc telling folks to change those in concert?
@wmputman Hmm. A question. Currently in Moist, USE_AEROSOL_NN
defaults to 1, which is why you have to uncomment out the line in AGCM.rc
to get it to 0.
But the default values in fvcore_layout.rc
in GEOSgcm_App are:
40: prog_ccn = .false.
64: use_ccn = .true.
which matches your USE_AEROSOL_NN: 0
settings.
Should we change the defaults in fvcore_layout.rc
to match the default in Moist? Or should GFDL be running with USE_AEROSOL_NN set to 0 by default if someone chooses it?
Some testing. After I pushed the make_nh
fix for gcm_setup
:
Run | Regress |
---|---|
AMIP NH | PASS |
Replay NH | PASS |
ReplayNoINC NH | PASS |
So that is good!
Now for the bad. As @sdrabenh reported, with NH dynamics, AMIP is not identical to Replay-NoIncrement! They are identical for hydrostatic dynamics, so I see no reason they shouldn't be for NH.
I wonder, do we need a REPLAY_W:
now for NH? Or maybe all those DUDT
bits in AGCM/mkiau need to have a DWDT
analogue? (This comment brought to you by "Matt doesn't really know how Replay works" aka "Matt asks @lltakacs or @atrayano when he has a replay question." 😄 )
I'll have to look at add_incs for NH...
There are never any W increments applied to FV3 (for now...) and DZ/PKZ just get adjusted based on T-Increments. If they are 0.0 there should be no change.
if (.not. HYDROSTATIC ) then
! remove old T from DZ
STATE%VARS%DZ = STATE%VARS%DZ / STATE%VARS%PT
! Update T
STATE%VARS%PT = STATE%VARS%PT *DPOLD
STATE%VARS%PT = (STATE%VARS%PT + DT*TEND*(MAPL_CP/CVM))/DPNEW
! update DZ with new T
STATE%VARS%DZ = STATE%VARS%DZ * STATE%VARS%PT
else
! Update T
STATE%VARS%PT = STATE%VARS%PT *DPOLD
STATE%VARS%PT = (STATE%VARS%PT + DT*TEND*(MAPL_CP/CVM))/DPNEW
endif
And then PKZ is just recalculated if getPKZ:
if ( .not.hydrostatic ) then
!$omp parallel do default(shared) do k=1,npz do j=jsc,jec do i=isc,iec ! perfect gas law: p = density rdgas virtual_temperature ! pkz(i,j,k) = ( rdgdelp(i,j,k)pt(i,j,k)/delz(i,j,k) )*kappa pkz(i,j,k) = exp( kappalog(rdgdelp(i,j,k)temp(i,j,k) & (1.d0+zvirqv(i,j,k))/delz(i,j,k)) ) enddo enddo enddo else !$omp parallel do default(shared) do k=1,npz do j=jsc,jec do i=isc,iec pkz(i,j,k) = (pk(i,j,k+1)-pk(i,j,k)) / & (kappa*(peln(i,j,k+1)-peln(i,j,k))) enddo enddo enddo endif
This was discovered by @sdrabenh as he was looking at the nonhydrostatic (NH) GEOS model. Turns out if you build GEOS for non-hydrostatic dynamics, the model does not start-stop regress. This usually points to an internal state issue. The question is: where?
I suppose there are two possible culprits: Dynamics and Moist. Maybe perhaps
DZ
andW
are not being carried around correctly in Dyn? Or perhaps Moist runs differently in non-hydro?I'm adding up the usual suspects (@atrayano, @bena-nasa, @wmputman) to this issue for opinions and thoughts.