geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
167 stars 163 forks source link

[BUG/ISSUE] Surface CH4 fields causing full chem simulation runs to fail if not starting on first hr of first day of month #304

Closed lizziel closed 4 years ago

lizziel commented 4 years ago

Using GEOS-Chem Classic in both 12.8.0 and dev/12.8.1 I get the following run-time error when running the benchmark simulation starting at any day and time other than YYYYMM01 000000.

=====================================================================
GEOS-Chem ERROR: Cannot get pointer to NOAA_GMD_CH4 or CMIP6_Sfc_CH4 in 
SET_CH4! Make sure the data source corresponds to your emissions year in 
HEMCO_Config.rc (NOAA GMD for 1978 and later; else CMIP6).
 -> at SET_CH4 (in module GeosCore/set_global_ch4_mod.F90)
=====================================================================

=====================================================================
GEOS-CHEM ERROR: Error encountered in call to "SET_CH4"!
STOP at  -> at GEOS-Chem (in GeosCore/main.F90)
=====================================================================

This error occurs during the very first timestep. I am looking into a fix for 12.8.1.

lizziel commented 4 years ago

It appears this behavior is a result of the time slice selection flag for NOAA_GMD_CH4 and CMIP6_Sfc_CH4 being EY. From the HEMCO users guide:

**E (exact):** Fields are only used if the time stamp on the field exactly
matches the current simulation datetime. In all other cases, data is
ignored but HEMCO does not return an error. For example, if the
source time attribute is set to 2000-2013/1-12/1-31/0 E, every time the
simulation enters a new day HEMCO will attempt to find a data field for
the current simulation date. If no such field can be found on the file, the
data is ignored (and a warning is prompted). This setting is particularly
useful for data that is highly sensitive to date and time, e.g. restart  variables.

**EF (exact, forced):** same as E, but HEMCO stops with an error if no data
field can be found for the current simulation date and time.
(v1.1.011 and higher)

**EC (exact, read/query continuously)**

**ECF (exact, forced, read/query continuously)**

**EY (exact, use simulation year):** Same as E, except don't allow Emission
year setting to override year value.:

Since these files are timestamped with the first time of each month, and the flag is Exact, data is not found in the first timestep for any run that does not start at midnight of the first day of a month. If data is not found for either NOAA_GMD_CH4 or CMIP6_Sfc_CH4 then the run fails with an error.

Using the range flag (R) avoids this issue. This flag is described as follows:

**R (range):** Data are only considered as long as the simulation time is
within the time range specified in attribute sourceTime. The provided
range does not necessarily need to match the time stamps of the input
file. If it is outside of the range of the netCDF time stamps, the closest
available date will be used. For instance, if a file contains data for years
2003 to 2010 and the provided range is set to 2006-2010/1/1/0 R, the file
will only be considered between simulation years 2006-2010. For
simulation years 2006 through 2009, the corresponding field on the file is
used. For all years beyond 2009, data of year 2010 is used. If the simulation
date is outside the provided time range, the data is ignored but HEMCO
does not return an error - the field is simply treated as empty (a corresponding
warning is issued in the HEMCO log file). For example, if the source time
attribute is set to 2000-2002/1-12/1/0 R, the data will be used for simulation
years 2000 to 2002 and ignored for all other years.

Another option to use background values if the exact date is not found.

@tsherwen , @sdeastham , @ltmurray : do any of you have comments on the intended behavior? We are using the exact flag since that is how the update was submitted. However, it seems the range flag is more appropriate. Thoughts?

sdeastham commented 4 years ago

The range behavior seems like it would be fine to me, but I'd say @tsherwen and @ltmurray would be the real arbiters on this one. That having been said, I'd love to unify the handling of CH4 with the other long-lived species (https://github.com/geoschem/geos-chem/issues/287), so ideally we'd have the same behavior for all of them.

ltmurray commented 4 years ago

The R flag is not appropriate for methane (less problematic for the other long-lived gases).

The prescribed surface methane concentrations were submitted as EF to force a graceful stop of the model if the user were to attempt to simulate outside the period for which observational constraints exist. This was a problem in earlier versions of the code when methane concentrations weren’t updateed after methane began rising in the atmosphere again ca. 2007, impacting ozone and OH concentrations. Once the model stops, users are then able to choose how they want to proceed, whether change the temporal behavior in HEMCO to C or create a new NetCDF file with more appropriate concentrations.

The time stamp doesn’t need to be exact as per the EF definition, but the default behavior should be for the model to stop if the present time is outside the temporal period of the input file, so users are aware.

sdeastham commented 4 years ago

That makes sense to me. It's true for the other long-lived source gases too, although to a much lesser extent, that having it recycle the last year might not yield great results. I'd suggest then RF for all of them. From the HEMCO manual

EDIT: Removed RY - realized it doesn't have the requisite forcing behavior.

ltmurray commented 4 years ago

Yeah, RF sounds like what we want.

tsherwen commented 4 years ago

I am OK with this for now too.

However, I think that explicitly saying in HEMCO to use emission dataset X for certain years and Y for another would be preferable. For CH4, that would allow the NOAA to override CMIP^ for the years the data is available. @christophkeller and I talked about this when I implemented the current setup for CH4, but I think I remember that this would require changes to HEMCO?

I think what you suggested in issue #287 is a good step. Maybe we can add the year-based selection in HEMCO as part of this?

lizziel commented 4 years ago

We have been using EY not EF so that emission year setting does not override the year value. It therefore seems like RY is the better choice, unless you think we need a new flag equivalent to RYF.

The implementation of checking which source to use has changed since the original submission due to issues reliably accessing the HEMCO clock (https://github.com/geoschem/geos-chem/issues/250). The new implementation always checks to see if NOAA_GMD_CH4 data is available in HEMCO. If it is not, then CMIP6_Sfc_CH4 data retrieval is attempted. If that is also not found, then the model exits with an error message. This is in line with our work towards reducing GEOS-Chem dependencies on HEMCO internals.

This method works because there is no overlap in years between the two datasets. Will that change in the future?

sdeastham commented 4 years ago

It would be fantastic if HEMCO had a way of seamlessly switching between datasets depending on the year (this will cause problems with MAPL/ExtData in GEOS and GCHP, but that's a topic for another time). I wanted to mention that we also already have at least one more data source as an option for CH4 surface VMRs - specifically, the WMO 2018 projections. These overlap with both the NOAA and CMIP6 projections. For my part, I will also be running with additional CH4 estimates and predictions which overlap, such as the various RCPs.

If there's a way to transparently handle different sources of data for different times to fill the same item, that seems ideal to me. But for the average GEOS-Chem user, it seems like having some kind of RFY option be standard for the NOAA dataset would be reasonable.

lizziel commented 4 years ago

@msulprizio told me that you can use the hierarchies in HEMCO to prioritize data application both spatially and temporally. This could therefore be used in theory to handle datasets that overlap in time.

Regarding the immediate fix to the issue of not being able to start beyond the first of the month, I want to get a quick fix into 12.8.1 which will be released very soon. I can put in a feature request for RYF functionality, but in the meantime we should pick one of the available options, RY or RF.

One thing that confuses me is why we would want the model to stop if the data is outside of the specified range (RF). I thought we wanted the model to keep going and try the other dataset to see if the simulation year is applicable to it instead. My understanding of the R flag is that outside of the specified range the HEMCO data container is empty, so using RY would work for this.

@christophkeller , is my understanding of this flag correct? I think we need to update the language in the HEMCO user's manual on this since it seems to give conflicting information, specifically the two sentences below in bold:

R (range): Data are only considered as long as the simulation time is within the time range specified in attribute sourceTime. The provided range does not necessarily need to match the time stamps of the input file. If it is outside of the range of the netCDF time stamps, the closest available date will be used. For instance, if a file contains data for years 2003 to 2010 and the provided range is set to 2006-2010/1/1/0 R, the file will only be considered between simulation years 2006-2010. For simulation years 2006 through 2009, the corresponding field on the file is used. For all years beyond 2009, data of year 2010 is used. If the simulation date is outside the provided time range, the data is ignored but HEMCO does not return an error - the field is simply treated as empty (a corresponding warning is issued in the HEMCO log file). For example, if the source time attribute is set to 2000-2002/1-12/1/0 R, the data will be used for simulation years 2000 to 2002 and ignored for all other years.

ltmurray commented 4 years ago

When the specified methane was implemented, there were no alternative datasets, so if the model were to continue without stopping, it would do so with incorrect methane concentrations.

Now, there are alternatives that have been implemented, but these are generally inconsistent with the observationally-derived monthly spatial distributions. For example, the WMO2018 dataset was designed for forecasting stratospheric ozone loss/recovery, and therefore does not include any tropospheric spatial or seasonal variability, just secular shifts, and is not based on observational constraints for the past few years. Most of our users are focused on the troposphere, and there could therefore be a sudden step-change in methane abundances in the middle of their simulation period that could lead to spurious changes in tropospheric photochemistry.

If a user were to simulate beyond the observational NOAA GMD constraint, I would therefore encourage them to run entirely with the WMO2018 dataset for consistency. But most users will want to use the observational NOAA GMD observations in the recent past. So it is best to have the model gracefully stop to allow them to make the decision that is most appropriate for their science. An alternative would be to try to make a harmonized dataset that gradually transitions from WMO or CMIP into NOAA GMD then back into WMO CMIP, but that would require a lot of maintenance.

lizziel commented 4 years ago

It sounds like there needs to be a new issue beyond this issue that I created. How about I use the RF flag for now so that we can release 12.8.1 without this bug. I will then create a new issue calling for a recommendation on the default handling of surface CH4 since the current handling, both before and after this bug fix, is not necessarily what we want to default to be.

(Aside: If you get this in an email please click the link at the bottom to go to github before responding if possible. Comments added via email include the last comment in the new comment which clutters up the issue page.)

lizziel commented 4 years ago

After discussion offline we are going to use the RY flag for NOAA_GMD_CH4 in 12.8.1 as a temporary solution to fix this bug in GEOS-Chem Classic. The new behavior is to default to the nearest NOAA_GMD_CH4 year data outside of the range specified in HEMCO_Config.rc. If simulation date is outside of the range then a warning will be printed to HEMCO.log that this is happening.

The warning printed to the log is buried amongst many other messages and warnings. It is therefore unlikely users will note that a surface CH4 year other than their simulation year is being used. For this reason there will be further discussion of this issue with a likely update to change the behavior again in a future version. This will be documented in a separate GitHub issue which I will link to here once it is created.

tsherwen commented 4 years ago

@lizziel I've been running some historical runs in v12.9.1 and I noted that this update was only applied to the NOAA_GMD_CH4 collection. Please could we also use the RF flag for the CMIP6_Sfc_CH4 data collection too?

This dataset is primarily for historical runs outside of the the available meteorology, when NOAA data would not be used in preference. So limiting the year used for this collection to simulation year may not be the best choice as I would be expecting users to be using the HEMCO Emission year variable to use this dataset for a specific historical year.

lizziel commented 4 years ago

Hi @tsherwen, I think Melissa will be making updates in 13.0 to restrict years of certain input datasets. @msulprizio, are the year rules for the CMIP_Sfc_CH4 inventory going to have any changes made?

lizziel commented 4 years ago

Also, looks like I dropped the ball on creating a new issue for this. Apologies!

msulprizio commented 4 years ago

Yes, I can change the time cycle flag to RF flag for CMIP6_Sfc_CH4 too. I will do this as part of a larger cleanup in HEMCO_Config.rc to ensure users know when they are attempting to use data that is outside of the available time range. I just added a new issue (https://github.com/geoschem/geos-chem/issues/475) to track the progress of that update.

tsherwen commented 4 years ago

Excellent, this sounds like the right route forward. Thanks @lizziel & @msulprizio