Closed yantosca closed 4 years ago
The number of solver calls (internal ODE integrator steps) will be a good enough metric!
The most direct metric though, is the wall time for this Integrate
call in each grid box:
But "time to run a function" probably doesn't qualify as a diagnostic collection, as its value is not deterministic (depends on hardware).
Also I remember that other calls like UPDATE_RCONST
can also take a significant amount of time, in addition to Integrate
. But those are all fixed computations and shouldn't have load-balancing problems.
@JiaweiZhuang That should be easy to do. We have the variables commented out but I can restore them. The itim ivariable would be the wall time of integrate in each grid box:
Also KPP returns the various number of calls in ISTATUS. If you want me to add those I can too... but as you say, they may be hardware dependent and not as instructive. I can start working on this now that my other deadlines have passed.
The itim ivariable would be the wall time of integrate in each grid box:
Oh I didn't know this itim
thing! Thanks for pointing this out!
Please just collect number of calls first, so I can make a quick plot😀The wall time will also be very interesting, for example to see whether the number of solver calls is consistent with the actual time.
@JiaweiZhuang: The KPP solver returns several diagnostics per each grid box:
! ISTATUS(1) -> No. of function calls
! ISTATUS(2) -> No. of jacobian calls
! ISTATUS(3) -> No. of steps
! ISTATUS(4) -> No. of accepted steps
! ISTATUS(5) -> No. of rejected steps (except at very beginning)
! ISTATUS(6) -> No. of LU decompositions
! ISTATUS(7) -> No. of forward/backward substitutions
! ISTATUS(8) -> No. of singular matrix decompositions
!
! RSTATUS(1) -> Texit, the time corresponding to the
! computed Y upon return
! RSTATUS(2) -> Hexit, last accepted step before exit
! RSTATUS(3) -> Hnew, last predicted step (not yet taken)
! For multiple restarts, use Hnew as Hstart
! in the subsequent run
Here is the plot of ISTATUS(1), which should be the number of times the solver was called in each grid box. This was for a 3hr test run:
Let me know if this looks like it should, or if there are any other diagnostics on that list you need. We can also print the # of times the jacobian gets updated, and the # of LU decomp calls.
I can see the sunrise/sunset patterns, so it makes sense! Is it 3-hour average? I expect a sharper spatial gradient for instantaneous fields (i.e. for one model step, https://github.com/geoschem/gchp/issues/44#issuecomment-537037449)
I think 1. function calls
, 2. jacobian calls
, 3. steps
are particularly useful. For consistency, I would suggest recording all 8 numbers in ISTATUS
, each is an entry in theKPP
diagnostics collection.
Yes that is a 3 hr average. You can kind of see the sunrise-sunset. Also more calls over the continents.
I'll save out all 8 fields. Should have that done by today.
@JiaweiZhuang: I have added the new KppDiags collection with all 8 fields of ISTATUS. The commits are:
I will a unit test and diff test on these updates. If OK, I'll merge into dev/12.7.0, so that it will ship with 12.7.0.
A geosfp_4x5_benchmark diff test yielded identical results:
###############################################################################
### VALIDATION OF GEOS-CHEM OUTPUT FILES
### Run ID: benchmark
##@
### IDENTICAL : HEMCO_restart.201607010100.nc.{Ref,Dev}
### IDENTICAL : GEOSChem.AdvFluxVert.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.AerosolMass.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.Aerosols.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.Budget.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.CloudConvFlux.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.ConcAfterChem.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.DryDep.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.JValues.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.LevelEdgeDiags.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.ProdLoss.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.SpeciesConc.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.StateChm.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.StateMet.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.Transport.20160701_0100z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.WetLossConv.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : GEOSChem.WetLossLS.20160701_0000z.nc4.{Ref,Dev}
### IDENTICAL : HEMCO_diagnostics.201607010000.nc.{Ref,Dev}
##%
###############################################################################
Could you upload a 3-hour KppDiags file (instantaneous field, with all 8 entries), so I can verity all entries on my own? An 8-variable NC file should be just a few MBs and can be uploaded as an attachment here. Thanks very much!
The Unit Test on the feature/KPPDiags branch just passed.
@JiaweiZhuang, here is the file you wanted. Let me know if it looks OK. If it's good, I'll merge up to dev/12.7.0. I had to gzip it to attach it here. GEOSChem.KppDiags.20160701_0000z.nc4.gz
Thanks @yantosca !!
Below is the analysis of the file, from this notebook. The result looks very reasonable and I suggest merging the branch. Thanks for the work😃!
The surface value shows hotspot over the continents and the terminator, which makes perfect sense:
The column integrated value only highlights the terminator, consistent with my assumption and the Alvanos and Christoudias 2017 paper https://github.com/geoschem/gchp/issues/44#issuecomment-537037449
The histogram shows a large spread (100~600), indicating a poor load-balancing. With a perfect balancing, there will be almost no spread.
Different diagnostics are highly correlated (just differ by a constant factor), except the "rejected time steps":
That's great! I'll merge the branch shortly. I'll also test with GCHP to make sure that works.
Also -- I wonder why there are no singular matrix decompositions. Maybe the Rosenbrock solver only uses LU decompositions exclusively? We can comment that out in the HISTORY.rc since it's all zeroes.
This is now merged into 12.7.0. I'll close this out.
I wonder why there are no singular matrix decompositions
Most solver options use sparse LU decomp for linear solve (Ax=b); not sure which solver uses SVD though.
@JiaweiZhuang, here is the file you wanted. Let me know if it looks OK. If it's good, I'll merge up to dev/12.7.0. I had to gzip it to attach it here.
Hi @yantosca -- to double check: Is this a time-average diagnostic or an instantaneous field? The sunrise/sunset looks too spread out, which makes me wonder whether it is a 3-hour average, not just a 20-minute chemical step.
@JiaweiZhuang , you are right. I think I gave you the 3-h field.
Here is instantaneous data (2016/07/01 @ 00:00 UTC):
As you can see, the terminator is much narrower, as would be expected for a 20min timestep.
Sorry for the confusion.
Here is instantaneous data
Thanks! This looks right to me. Could you also upload the nc file?
@JiaweiZhuang, here it is, saved in a zip folder: KppDiags.zip
This is also available on S3 at: https://s3.console.aws.amazon.com/s3/object/yantosca-gc-bucket/for_jiawei/GEOSChem.KppDiags.20160701_0000z.nc4?region=us-east-1&tab=overview
See: https://github.com/geoschem/gchp/issues/44.
It should be a simple matter to add this diagnostic, perhaps into GEOS-Chem 12.7.0.