geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
164 stars 157 forks source link

[FEATURE REQUEST] Add diagnostic for number of times KPP is called in each grid box #77

Closed yantosca closed 4 years ago

yantosca commented 4 years ago

See: https://github.com/geoschem/gchp/issues/44.

It should be a simple matter to add this diagnostic, perhaps into GEOS-Chem 12.7.0.

JiaweiZhuang commented 4 years ago

The number of solver calls (internal ODE integrator steps) will be a good enough metric!

The most direct metric though, is the wall time for this Integrate call in each grid box:

https://github.com/geoschem/geos-chem/blob/5f25c0fc6ff27ec076d766ac178554d9b91bf636/GeosCore/flexchem_mod.F90#L988-L991

But "time to run a function" probably doesn't qualify as a diagnostic collection, as its value is not deterministic (depends on hardware).

Also I remember that other calls like UPDATE_RCONST can also take a significant amount of time, in addition to Integrate. But those are all fixed computations and shouldn't have load-balancing problems.

yantosca commented 4 years ago

@JiaweiZhuang That should be easy to do. We have the variables commented out but I can restore them. The itim ivariable would be the wall time of integrate in each grid box:

https://github.com/geoschem/geos-chem/blob/5f25c0fc6ff27ec076d766ac178554d9b91bf636/GeosCore/flexchem_mod.F90#L1007-L1021

Also KPP returns the various number of calls in ISTATUS. If you want me to add those I can too... but as you say, they may be hardware dependent and not as instructive. I can start working on this now that my other deadlines have passed.

JiaweiZhuang commented 4 years ago

The itim ivariable would be the wall time of integrate in each grid box:

Oh I didn't know this itim thing! Thanks for pointing this out!

Please just collect number of calls first, so I can make a quick plot😀The wall time will also be very interesting, for example to see whether the number of solver calls is consistent with the actual time.

yantosca commented 4 years ago

@JiaweiZhuang: The KPP solver returns several diagnostics per each grid box:

!    ISTATUS(1)  -> No. of function calls
!    ISTATUS(2)  -> No. of jacobian calls
!    ISTATUS(3)  -> No. of steps
!    ISTATUS(4)  -> No. of accepted steps
!    ISTATUS(5)  -> No. of rejected steps (except at very beginning)
!    ISTATUS(6)  -> No. of LU decompositions
!    ISTATUS(7)  -> No. of forward/backward substitutions
!    ISTATUS(8)  -> No. of singular matrix decompositions
!
!    RSTATUS(1)  -> Texit, the time corresponding to the
!                     computed Y upon return
!    RSTATUS(2)  -> Hexit, last accepted step before exit
!    RSTATUS(3)  -> Hnew, last predicted step (not yet taken)
!                   For multiple restarts, use Hnew as Hstart 
!                     in the subsequent run

Here is the plot of ISTATUS(1), which should be the number of times the solver was called in each grid box. This was for a 3hr test run:

kpp_png

Let me know if this looks like it should, or if there are any other diagnostics on that list you need. We can also print the # of times the jacobian gets updated, and the # of LU decomp calls.

JiaweiZhuang commented 4 years ago

I can see the sunrise/sunset patterns, so it makes sense! Is it 3-hour average? I expect a sharper spatial gradient for instantaneous fields (i.e. for one model step, https://github.com/geoschem/gchp/issues/44#issuecomment-537037449)

I think 1. function calls, 2. jacobian calls, 3. steps are particularly useful. For consistency, I would suggest recording all 8 numbers in ISTATUS, each is an entry in theKPP diagnostics collection.

yantosca commented 4 years ago

Yes that is a 3 hr average. You can kind of see the sunrise-sunset. Also more calls over the continents.

I'll save out all 8 fields. Should have that done by today.

yantosca commented 4 years ago

@JiaweiZhuang: I have added the new KppDiags collection with all 8 fields of ISTATUS. The commits are:

  1. https://github.com/geoschem/geos-chem/commit/5ac595c7cd8c315db4408feb58b53cc785d2c004
  2. https://github.com/geoschem/geos-chem/commit/1e767e22e044a550f0b0569782d6fa50116b41f0
  3. https://github.com/geoschem/geos-chem-unittest/commit/481213b33c828a7b74bb76debb71f0a2f5a3c852

I will a unit test and diff test on these updates. If OK, I'll merge into dev/12.7.0, so that it will ship with 12.7.0.

yantosca commented 4 years ago

A geosfp_4x5_benchmark diff test yielded identical results:

###############################################################################
### VALIDATION OF GEOS-CHEM OUTPUT FILES
### Run ID: benchmark
##@
### IDENTICAL : HEMCO_restart.201607010100.nc.{Ref,Dev}
### IDENTICAL : GEOSChem.AdvFluxVert.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.AerosolMass.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.Aerosols.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.Budget.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.CloudConvFlux.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.ConcAfterChem.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.DryDep.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.JValues.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.LevelEdgeDiags.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.ProdLoss.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.SpeciesConc.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.StateChm.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.StateMet.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.Transport.20160701_0100z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.WetLossConv.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.WetLossLS.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : HEMCO_diagnostics.201607010000.nc.{Ref,Dev}
##%
###############################################################################
JiaweiZhuang commented 4 years ago

Could you upload a 3-hour KppDiags file (instantaneous field, with all 8 entries), so I can verity all entries on my own? An 8-variable NC file should be just a few MBs and can be uploaded as an attachment here. Thanks very much!

yantosca commented 4 years ago

The Unit Test on the feature/KPPDiags branch just passed.

@JiaweiZhuang, here is the file you wanted. Let me know if it looks OK. If it's good, I'll merge up to dev/12.7.0. I had to gzip it to attach it here. GEOSChem.KppDiags.20160701_0000z.nc4.gz

JiaweiZhuang commented 4 years ago

Thanks @yantosca !!

Below is the analysis of the file, from this notebook. The result looks very reasonable and I suggest merging the branch. Thanks for the work😃!

The surface value shows hotspot over the continents and the terminator, which makes perfect sense: image

The column integrated value only highlights the terminator, consistent with my assumption and the Alvanos and Christoudias 2017 paper https://github.com/geoschem/gchp/issues/44#issuecomment-537037449 image

The histogram shows a large spread (100~600), indicating a poor load-balancing. With a perfect balancing, there will be almost no spread. image

Different diagnostics are highly correlated (just differ by a constant factor), except the "rejected time steps":

image

yantosca commented 4 years ago

That's great! I'll merge the branch shortly. I'll also test with GCHP to make sure that works.

Also -- I wonder why there are no singular matrix decompositions. Maybe the Rosenbrock solver only uses LU decompositions exclusively? We can comment that out in the HISTORY.rc since it's all zeroes.

yantosca commented 4 years ago

This is now merged into 12.7.0. I'll close this out.

JiaweiZhuang commented 4 years ago

I wonder why there are no singular matrix decompositions

Most solver options use sparse LU decomp for linear solve (Ax=b); not sure which solver uses SVD though.

JiaweiZhuang commented 4 years ago

@JiaweiZhuang, here is the file you wanted. Let me know if it looks OK. If it's good, I'll merge up to dev/12.7.0. I had to gzip it to attach it here.

Hi @yantosca -- to double check: Is this a time-average diagnostic or an instantaneous field? The sunrise/sunset looks too spread out, which makes me wonder whether it is a 3-hour average, not just a 20-minute chemical step.

yantosca commented 4 years ago

@JiaweiZhuang , you are right. I think I gave you the 3-h field.

Here is instantaneous data (2016/07/01 @ 00:00 UTC):

Total KPP integrator counts @ surface

jiawei_1

Histogram

jiawei_2

Column-integrated KPP integrator counts

jiawei_3

Column integrated histogram

jiawei_4

As you can see, the terminator is much narrower, as would be expected for a 20min timestep.

Sorry for the confusion.

JiaweiZhuang commented 4 years ago

Here is instantaneous data

Thanks! This looks right to me. Could you also upload the nc file?

yantosca commented 4 years ago

@JiaweiZhuang, here it is, saved in a zip folder: KppDiags.zip

This is also available on S3 at: https://s3.console.aws.amazon.com/s3/object/yantosca-gc-bucket/for_jiawei/GEOSChem.KppDiags.20160701_0000z.nc4?region=us-east-1&tab=overview