dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Enhance ASCII2NC to read ISMN point observations of soil moisture and temperature #2701

Closed JohnHalleyGotway closed 10 months ago

JohnHalleyGotway commented 1 year ago

Describe the New Feature

This issue is to enhance the MET ASCII2NC tool to ingest point observations of soil moisture and temperature.

Sample data can be retrieved from https://ismn.bafg.de/en/data/data-download.

As described on the ISMN formats page, data is available in two formats:

  1. Following the CEOP conventions (Coordinated Energy and water cycle Observations Project).
  2. Stored in a Header+values format.

Adding support for either of these is acceptable. @anewman89, please advise on which is preferable.

Note that in both cases, the variable name is embedded only in the filename:

CSE_Network_Station_Variablename_depthfrom_depthto_sensorname _startdate_enddate.ext

The Variablename element defines the data type and separate variables are provided in separate files.

Be sure that ASCII2NC can handle multiple input files in each run.

Acceptance Testing

Add at least one new ascii2nc unit test to demonstrate this functionality. Coordinate on the corresponding METplus use case.

Time Estimate

3 days (?)

Sub-Issues

Consider breaking the new feature down into sub-issues. No sub-issues needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

7790111 (CLASP)

Define the Metadata

Assignee

Labels

Milestone and Projects

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

anewman89 commented 1 year ago

@JohnHalleyGotway I think using the CEOP standard files will be good to start with. There may be other data that follow that general convention as it was defined through GEWEX.

DanielAdriaansen commented 1 year ago

@anewman89 is this issue a pre-requisite for either dtcenter/METplus#2388 or dtcenter/METplus#2390?

anewman89 commented 1 year ago

@DanielAdriaansen This is not a pre-rec for either of those issues. dtcenter/METplus#2390 does not need soil moisture data. While dtcenter/METplus#2388 does need soil moisture data, that will come from the FLUXNET2015 data file, which is distinct from this data stream.

JohnHalleyGotway commented 11 months ago

@anewman89 listed below are some detailed code comments I wrote up to define how the ISMN obs will be processed, including input/output units, necessary unit conversions, and naming conventions.

Here are 3 questions that remain to be answered:

  1. Do you like mapping of obs to existing GRIB conventions make sense? Any I got wrong?
  2. If we do NOT convert soil-moisture units to match GRIB, what should we name those observations? We should NOT call them SOILM because that would imply the GRIB units. But we will be writing obs named PRATE, SNOD, WEASD, TMP, TSOIL, BARET. Writing "sm" for soil-moisture doesn't look consistent in that context.
  3. What are the units for soil-suction? And what should we call it in the output? Again, "su" doesn't seem like a very good name. Or do you want me to make ASCII2NC explicitly ignore observations of soil-suction?
    //    - Variablename: Name of the variable in the file (e.g., Soil-Moisture)
    //      - p: preciptation in mm/h
    //        - Store as GRIB Code 59 (PRATE in kg/m^2/s)
    //        - Convert from mm/h to kg/m^2/s
    //      - sd: snow depth in mm
    //        - Store as GRIB Code 66 (SNOD in m)
    //        - Convert from mm to m
    //      - sm: soil moisture in kg^3/kg^3
    //        - DO NOT store as GRIB code 86 (SOILM in kg/m^2)
    //          to preserve the existing units
    //      - su: soil suction in unknown units
    //        - Store as GRIB Code -1 (not defined)
    //      - sweq: snow water equivalent in mm
    //        - Store as GRIB Code 65 (WEASD kg/m^2)
    //      - ta: air temperature in C
    //        - Store as GRIB Code 11 (TMP in K)
    //        - Convert from C to K
    //      - ts: soil temperature in C
    //        - Store as GRIB Code 85 (TSOIL in K)
    //        - Convert from C to K
    //      - tsf: surface temperature in C
    //        - Store as GRIB Code 147 (BARET in K)
    //        - Convert from C to K
anewman89 commented 11 months ago

@JohnHalleyGotway Thanks for this summary.

For the mappings these look good outside of sm, su, and perhaps tsf. I have more comments on sm and su to answer Q's 2 and 3.

For sm, we can use GRIB code 144 - SOILW. This is the fractional volumetric soil moisture content. In the GRIB code its unitless instead of carrying the m^3/m^3 units, which I think is fine.

For su, you're correct there is no GRIB code. The units are typically kPa, and it should always be negative as its a suction. I would expect CEOP standard would handle this, but it may be good to check. For a name, could we use SMP (Soil Matric Potential) or SMS (Soil Matric Suction)?

For tsf, it's unclear to me if the observed tsf would always be GRIB Code 147 - BARET. It's possible it could be more related to an average skin temperature, or the skin temperature of a vegetated surface, such as grass. So could consider GRIB Code 148 - AVSFT. Regardless, is there a way to note that the user should be aware of what the observed skin temperature may represent, and then also what the model may be giving? BARET and AVSFT will be different in some output.

JohnHalleyGotway commented 11 months ago

Thanks for the guidance @anewman89. Another question is about the "height" of the observations... and by height I really mean depth below ground. The ISMN format specifies the depth as a range (depth from, depth to). For MET, we need to pick a single value.

I pulled the full archive of 23,341 ISMN files, and ran a script to check the depth values.

All depths are >= 0 with the exception of 36 outliers in the XMS-CAT and IPE networks with depths ranges of -2.0000 -2.5000 and -1.5000 -2.0000.

I propose that we just store the depth in MET as the maximum of the "depth from" and "depth to" values. Seems nice and simple and easy to document.

Any objections to that simple approach? Or would you prefer something different?

JohnHalleyGotway commented 11 months ago

After this work in ASCII2NC is completed, @JohnHalleyGotway remember to create related issues in the METplus repository to:

  1. Enhance the ASCII2NC wrapper to handle this new data source.
  2. Update the Verification Datasets Guide to describe the utility of this data source for verification (likely @anewman89)

This was discussed during the METplus Engineering meeting on 12/5/23.

anewman89 commented 11 months ago

@JohnHalleyGotway Thanks for checking the measurement 'height' or depth. I think I slightly prefer using the middle point of the range if we had to pick one. Really as long as we document it clearly it probably doesn't matter, but from a measurement perspective, I think the midpoint is slightly more valid.

Following this, is it possible to carry both the starting and ending depths in meta data somewhere? That may be useful for human interpretation of results to keep that information.

JohnHalleyGotway commented 11 months ago

@anewman89, no, there isn't really a way to store the range of "height" values for a single observation. MET assumes that individual obs are recorded at a single height. And there isn't really a good way to indicate a range of values... even just for human inspection. I could have ascii2nc keep track of unique combinations of message type, variable names, and level ranges it encountered and print summary log messages about it... if you think that's worthwhile.

anewman89 commented 11 months ago

Thanks @JohnHalleyGotway. That's fine, I don't think it's critical to keep that information in this data stream. The user should be aware of the initial ASCII data and be able to track ranges from that.