[Draft] low latency boundary condition generation

nicholasbalasus commented 1 year ago

Name and Institution (Required)

Name: Nick Balasus Institution: Harvard University

Describe the update

This draft PR will include updates to how we generate boundary conditions (likely v2023-08 that will accompany the IMI 2.0). This includes some bug fixes but mostly changes that will allow us to generate boundary conditions with lower latency.

Changes that will impact the output:

[x] switch to 2° x 2.5° (#146)
[x] upgrade to GEOS-Chem v14.2.1 and ensure consistency with IMI (#147)
[x] sample GEOS-Chem SpeciesConcVV_CH4 and LevelEdgeDiagn instantaneously instead of time-averaged (previous bug)
[x] remove the dropping of outliers (#148)
[x] use a restart file for 1 April 2018 from Todd's simulation (from NOAA surface data)
[x] reduce the temporal smoothing of biases to -15 days (instead of +/- 15 days)
[x] drop observations over water and south of 60°S
[x] explore data density minimums (when regridding and when defining a zonal mean bias)
[x] checking that our TROPOMI operator is behaving as expected (#110)

Changes that will not impact the output:

[x] modify the default GEOS-Chem config files with sed instead of providing config files
[x] add flexibility for number of vertical layers (#155)

nicholasbalasus commented 10 months ago

I am only requesting the review from @laestrada, but I will tag others in places they might want to chime in (thanks @djvaron, @msulprizio, @toddmooring, @eastjames, @megan-he!). This should close the five issues linked above.

This update uses GCClassic 14.2.1 to generate boundary conditions at 2.0° x 2.5°. They will be more appropriately named v2023-09 (my branch name with v2023-08 was too ambitious it seems). I will generate the full record (and compare to the 4.0° x 5.0° v2023-06 boundary conditions) once this code looks good and 14.2.1 is officially released.

I have tested two things here:

(1) The default IMI test case on Cannon.

I tested this because GCClassic is now cloned as part of the setup for IMI (getting rid of the need for users to clone it themselves). This is the bash script used to test: test-default-imi.txt.

It works - results here if you're interested: visualization_notebook.html.zip

(2) The reproducibility of our boundary conditions generation.

I generated three sets of boundary conditions: test 1 of 20180401-20180630, test 2 of 20180515-20180531 (using a restart generated by test 1), test 3 of 20180615-20180630 (using a restart generated by test 2). This is the bash script used to test: test-bcs.txt. The boundary conditions from test 2 and test 3 are consistent with those generated in test 1, as is tested with this simple Python code:

import xarray as xr
import numpy as np
from datetime import date, timedelta

basedir = "/n/holyscratch01/jacob_lab/nbalasus/"

test2_dates_to_check = [(date(2018, 5, 15) + timedelta(days=i)).strftime("%Y%m%d") for i in range(17)]
test3_dates_to_check = [(date(2018, 6, 15) + timedelta(days=i)).strftime("%Y%m%d") for i in range(16)]

for date in test2_dates_to_check:
    with xr.open_dataset(basedir + f"imi-test1/blended-boundary-conditions/GEOSChem.BoundaryConditions.{date}_0000z.nc4") as ds1,\
         xr.open_dataset(basedir + f"imi-test2/blended-boundary-conditions/GEOSChem.BoundaryConditions.{date}_0000z.nc4") as ds2:
        abs_diff = 1e9*np.abs((ds1["SpeciesBC_CH4"] - ds2["SpeciesBC_CH4"]).values)
        print(f"{date} -> {np.max(abs_diff)}")
        assert np.max(abs_diff) < 0.01, f"Diff is greater than 0.01 ppb for {date}"

for date in test3_dates_to_check:
    with xr.open_dataset(basedir + f"imi-test1/blended-boundary-conditions/GEOSChem.BoundaryConditions.{date}_0000z.nc4") as ds1,\
         xr.open_dataset(basedir + f"imi-test3/blended-boundary-conditions/GEOSChem.BoundaryConditions.{date}_0000z.nc4") as ds2:
        abs_diff = 1e9*np.abs((ds1["SpeciesBC_CH4"] - ds2["SpeciesBC_CH4"]).values)
        print(f"{date} -> {np.max(abs_diff)}")
        assert np.max(abs_diff) < 0.01, f"Diff is greater than 0.01 ppb for {date}"

nicholasbalasus commented 10 months ago

Thanks, @laestrada! Everything is addressed now (except for isRegional, which will need to wait for now). Testing now.

nicholasbalasus commented 10 months ago

Thanks @laestrada! Let me know if this is ready to squash + merge.

toddmooring commented 10 months ago

Hi everyone but especially @nicholasbalasus and @laestrada . Apologies for the delayed response. I just want to make sure I understand how the boundary condition generation is supposed to work:

1) The user picks a start date for the inversion on or after 1 April 2018. 2) If the date is after 1 April 2018, global GEOS-Chem is run forward at 2x2.5/47L to the start date. The initial condition is a 1 April 2018 restart supplied by me, emissions are assumed based on some inventory. 3) From the start date on to the end date, the global 2x2.5/47L run writes boundary condition files. 4) The inversion is performed, using among other things the boundary conditions generated by the above process.

Assuming this is how it works, I have two questions/suggestions:

1) What happened to Daniel's suggestion of only doing a 1-month emissions-driven spinup after initializing with one of my restarts (see his 7/27/23 1:04 p.m. email)? Are we just now relying on the bias correction process (in which we make the GEOS-Chem state look more like TROPOMI) to mitigate biases created by emissions-driven spinups of more than 1 month?

2) Note that I actually now have monthly restarts as late as 1 May 2018 from a 2x2.5/47L simulation. This simulation was initialized (from a 2x2.5/72L restart) at 1 May 2012, with that restart coming from a 2x2.5/72L simulation started on 1 January 1985. I think you should switch to using a native 47L restart, for two reasons: a) it skips the process of interpolating from 72L to 47L and b) the stratospheric methane climate has had ~6 years to adjust to the 47L structure. Probably not hugely different from the 72L structure, but why not get rid of this source of a potential fake transient if you can?

toddmooring commented 10 months ago

Just to be clear, I realize my description of how the boundary condition generation process works glosses over the application of bias corrections to GEOS-Chem fields to make them better match TROPOMI.

nicholasbalasus commented 10 months ago

Hi Todd - thanks for looking through this!

The upshot of the code is GEOS-Chem (2x2.5/47L) is initialized with your 1 April 2018 restart file and run until present. The archived boundary conditions are corrected to look like TROPOMI (starting when the TROPOMI beings on 30 April 2018), which is what we need for nested-grid simulations for regional inversions.

I think I understand how what you are referring to in (1) would work for making a restart file for a global simulation (that is, run 1 December 2018 with emissions starting with your restart and then correct the 1 January 2019 output with TROPOMI). However, I don't see how this would be practical for generating multiple years of boundary conditions that we need for nested-grid simulations.

For (2), I will switch to the 47 L file if you point me towards it. Thanks!

toddmooring commented 10 months ago

I have conveyed the 2x2.5/47L restart to @nicholasbalasus via Slack

laestrada commented 10 months ago

@toddmooring Thanks for providing the restart file! Just to further explain Nick's explanation -- we do not expect users to generate the boundary conditions themselves. Instead, we create an archive (available from the washu data portal and aws), which the IMI downloads for the relevant user dates.

geoschem / integrated_methane_inversion

[Draft] low latency boundary condition generation #156

Name and Institution (Required)

Describe the update