[FEATURE REQUEST] Add stratospheric benchmarking routines

msulprizio commented 3 years ago

@sdeastham provided the GCST with python scripts to evaluate the stratosphere. His scripts been used to evaluate the GEOS-Chem 10-year benchmarks for 13.0.0-rc.0 (GCClassic and GCHP). Ideally the scripts should be brought into GCPy with the rest of the GEOS-Chem benchmark plotting routines.

Seb wrote:

I’ve (finally!) gotten a version of the stratospheric benchmark code working on Cannon, and wanted to start the process of handing it off to you so that it might be used for the 13.0.0 benchmarking process. The code in question is at /n/home13/seastham/Python_Benchmark. Some notes:

Requirements The main script is benchmark_strat_v7.py. To run it, you’ll need the following in addition to the standard gcpy requirements:

A working install of gcgridobj (which requires cubedsphere and xesmf in particular). Routines from gcgridobj are used to allow flexible regridding and to perform plotting;

The files sza_calc.py, read_MLS_OMI.py, and atmos_processing.py. These are some miniature packages I’ve cobbled together; sza_calc.py performs simple solar zenith angle calculations, read_MLS_OMI.py reads in OMI data, and atmos_processing.py performs one additional plot type (latitude x time).

Observation data from OMI and MLS. I’ve uploaded relevant data for a few test years (notably 2016 and 2019), but will continue to upload more test data as required. The routines used in the benchmark can make use of raw level 2 MLS data without preprocessing, but the OMI data needs to be processed first by the file grab_OMI.py to convert them into preprocessed (v7) files. Currently, I have sample data in the directory /n/holyscratch01/jacob_lab/seastham/Strat_Benchmark:

OMI/Raw: Level 2 OMI data, downloadable from NASA’s EarthData service (https://search.earthdata.nasa.gov/) by searching for “OMTO3G” and “OMNO2G”

OMI/PreProc: Processed OMI data, generated from OMI/Raw using grab_OMI.py (e.g. python3 grab_OMI.py 2016 would process year 2016 data if present in OMI/Raw, although you would first want to modify the default input_dir and targ_dir in grab_OMI.py)

MLS: Level 2 MLS data, downloadable from NASA’s EarthData service by searching for “ML2O3” (in the case of, for example, ozone). The scripts are calibrated to use v004 data. Although v005 data is available, the data filtering is based on the v004 guidelines, so using v005 data may give incorrect results.

Plots The benchmark consists of 4 sections. I’ll refer to them from here on out using the names AoA, MLS, OMI, and Poles:

AoA: an age-of-air calculation using a clock tracer;

MLS: a comparison of seasonal and annual zonal mean concentration fields to observations from the Microwave Limb Sounder (MLS) aboard Aura;

OMI: a comparison of column total ozone and NO2 to observations from the Ozone Mapping Instrument (OMI) on the same platform; and

Poles: a comparison of ozone at 82N and 82S during the local ozone hole period to MLS estimates.

Usage scenarios The benchmark script always compares run output (“Sim”) to observations (“Obs”) , but it can also compare to a reference simulation (“Ref”, e.g. output from the previous version of GEOS-Chem). The script uses gcgridobj regridding functions to allow the simulations to be run at any cubed-sphere or lat-lon resolution. The resolutions do not need to match between Sim and Ref; everything is converted to 4x5 for comparison. However, if comparing between two simulations, it is expected that the data available for the Ref output is at least as comprehensive as for the Sim output (i.e. if Sim is monthly only, then only monthly data is needed from Ref; but if Sim has daily output and that is requested, Ref must also have daily output).

AoA plot For the age of air plots, the simulations must have been run with a clock tracer; if one isn’t found, the age of air plot is not shown. If the clock tracer is found, the simulation zonal mean age of air is shown, and if a reference is provided, the difference between the simulation and reference is shown alongside it.

MLS plots If no reference is provided, the MLS plots show:

Sim

Obs

(Sim – Obs)

100 * (Sim – Obs) / Obs

If a reference is provided, the MLS plots also show:

(Sim – Ref)

100 * (Sim – Ref) / Ref The MLS plots for most species will be generated using monthly output data. Plots will be generated comparing zonal mean CH4Cl, H2O, HCl, HNO3, N2O, and ozone, between Sim, MLS, and (if available) Ref. If hourly output is available, this will be used to also provide comparison plots of NO2 and ClO, as well as providing a slightly improved comparison of ozone

OMI plots Two different plots can be generated using OMI column data. The first is ozone; this will be generated using hourly data if available, and monthly data if hourly data is not available. Please note that the monthly plot can be a bit wonky because of missing periods in the OMI data collection. The second plot is NO2. This plot is only created if hourly data is available.

Poles plots Finally, the results from Sim (and, if present, Ref) are compared to observations from MLS and OMI at 82 degrees North and 82 degrees South. This reproduces a plot from Solomon et al (2015) but requires high sampling frequency, so is only created if hourly data is available.

Simulation output data requirements Whether benchmarking a simulation against observations only or against a reference simulation, it’s currently set up to expect monthly output data, specifically in files names [exptid].SpeciesConc.YYYYmm01_0000z.nc4. These files are expected to include volumetric mixing ratios for all the target species. If they include a species called CLOCK, they will also be used to calculate the age of air (but see earlier comment). [exptid] is set through input arguments to the script (and can differ between the Sim and Ref output, based on the arguments exptid and exptid_ref respectively). Similarly, it will look for [exptid].StateMet.YYYYmmdd_0000z.nc4, and will expect that file to contain (at least) Met_PSC2WET. If the switch --nohourly is NOT supplied, the script will also look for files called [exptid].StratBM.YYYYmmdd_0000z.nc4. Each file must be output daily and must contain hourly average data for the following fields: SpeciesConc_NO2, SpeciesConc_O3, SpeciesConc_ClO, Met_PSC2WET, Met_BXHEIGHT, Met_AIRDEN, and Met_AD.

Invoking the script, and arguments The script is designed to be invoked from the command line. There are a lot of possible input options – but I expect that a typical invocation will look like the following (argument details below). Take a look at /n/home13/seastham/Python_Benchmark to see a bare-bones working directory which is capable of running this script. If you are able to reproduce that, then running the command below (and editing the entry in bold so that it writes to somewhere you have write privileges for!) should result in a comparison between v12.9.0, v12.6.0, and observational data:

python3 benchmark_strat_v6.py --year=2016 --verbose –rebuild --nostore --plot=/n/home13/seastham/strat_test --sim_name=benchmark_test_compare --sim=/n/seasasfs02/msulprizio/GC/benchmarks/1yr/12.9.0/FullChem/GCC/OutputDir --ref=/n/seasasfs02/msulprizio/GC/benchmarks/1yr/12.6.0/Run0/OutputDir --nohourly --split_djf --omi_dir=/n/holyscratch01/jacob_lab/seastham/Strat_Benchmark/OMI/PreProc --mls_dir=/n/holyscratch01/jacob_lab/seastham/Strat_Benchmark/MLS --mls_clo_csv=/n/holyscratch01/jacob_lab/seastham/Strat_Benchmark/MLS/ClO_bias_MLS_v42.csv --exptid=GEOSChem --exptid_ref=GEOSChem --sim_plot_name=”GCC 12.9.0” --ref_plot_name=”GCC 12.6.0”

--year: The simulation year to be processed. --verbose: Keep the user updated about what is happening. --rebuild: Reading the hourly data can take a long time, because the script is trying to get the simulation output for the Aura overpass time in each location. However, once that’s done, the data size can be collapsed significantly, and this data is stored (unless --nostore is selected) . Has no effect if --nohourly is given. --nostore: Prevents storage of processed hourly data. If neither --nostore nor --nohourly (see below) are provided, then a (potentially sizable) data file will be stored in the plotting directory. Has no effect if --nohourly is given. --plot: The directory in which the benchmarking output will be stored. This script anticipates that you may want to perform multiple comparisons – see the --sim argument below. --sim_name: The name to be used to identify this comparison. This name will be given to a subdirectory of that given to --plot, and all benchmark output will be stored there. --sim: Path to the output directory to be benchmarked. --ref: OPTIONAL. Path to the output directory for the reference simulation (e.g. from the previous version of GEOS-Chem). --nohourly: Do not perform any comparisons which require hourly data. Please note: without hourly data, the stratospheric benchmark is of very limited utility. --split_djf: The stratospheric benchmark for a given year, say 2016, can be performed either using 2015-12-01 to 2016-12-01, or using 2016-01-01 to 2017-01-01. The script will default to the former behavior because this avoids having to use a non-continuous time period for the winter (DJF) season. However, if --split_djf is provided, the latter behavior is used. --omi_dir: Location of the preprocessed OMI observations. --mls_dir: Location of the unprocessed, level-2, version 004 MLS observations. --mls_clo_csv: The location of a CSV file containing pressure-dependent bias corrections for the MLS ClO observations. --exptid: The experiment ID used in HISTORY.rc for Sim. For GEOS-Chem Classic I believe this is usually "GEOSChem", while for GCHP I believe it’s usually "GCHP". --exptid_ref: As above, but now for the Ref simulation output. --sim_plot_name: The name to use for Sim in plots. --ref_plot_name: The name to use for Ref in plots.

The other two possible input arguments are not used here, but are: --clock_init_ref: The reference year for the clock tracer (i.e. the year at which the clock tracer surface mixing ratio would be zero, since which point it would be expected to increase linearly at a rate of 365.25 pptv per year). --fontsize: Font size to use for plotting.

sdeastham commented 3 years ago

Thanks for raising this @msulprizio ! One minor clarification: there are a few new flags (mostly associated with getting the scripts to work with GCHP's slightly wonky timestamping for time-averaged files, which will hopefully be made obsolete before too long), and since version 7 of the scripts, they actually expect MLS v5 output rather than MLS v4.2 (in contrast with the above advice, which I wrote for version 6 of the scripts). Thanks also for all your work in fixing the various bugs in there!

msulprizio commented 3 years ago

The latest working version of these scripts can be found on Cannon in /n/home05/msulprizio/GC/benchmarks/Strat_Benchmark. That directory has its own Git repository for tracking changes to these scripts (until we bring them into GCPy).

msulprizio commented 3 years ago

The data files required for the stratospheric benchmark plots can now be found in /n/seasasfs02/Strat_Benchmark.

github-actions[bot] commented 10 months ago

Stale issue message

geoschem / gcpy

[FEATURE REQUEST] Add stratospheric benchmarking routines #106