dtcenter / METplus

Python scripting infrastructure for MET tools.
https://metplus.readthedocs.io
Apache License 2.0
97 stars 37 forks source link

Add new use case of python embedding for point observations with NRL innovation files. #635

Open JohnHalleyGotway opened 3 years ago

JohnHalleyGotway commented 3 years ago

Describe the New Feature

This task is to create a python script to read NRL innovation files and use python embedding to pass them into the ascii2nc tool. Once that script works well, collaborate with NRL staff to develop a new METplus use case which calls ascii2nc to prepare the point observations and then either Point-Stat or Ensemble-Stat to verify model output against them. Need to get direction from Liz Satterfield for direction on these details.

See details about the formatting of the NRL innovation files as a comment on this issue.

Consider breaking down the multiple steps for this task into sub-issues: (1) Write and test a python embedding script to process the input file with ascii2nc. (2) See comments for a description of how Liz would like (1) to be configurable. Should filtering logic be added to the python-embedding script or to the ascii2nc tool? (3) Gather sample model data which should be verified using these point observations and develop of METplus use case for these steps. (4) This data may also be suitable for python embedding in Stat-Analysis to compute statistics directly using the obs and innovation values. Explore this option and potentially include it in the use case as well.

Acceptance Testing

Input NRL innovations files can be found in kiowa:/d1/projects/METplus/METplus_Development/feature_635/innovation_data Contact Liz Satterfield for the model data to be used in use case development.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

Relevant Deadlines

METplus-4.0 release

Funding Source

2700021

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 3 years ago

Email from Liz on 9/23/2020: The innovation files are ascii files with a 75 line header. The column description is below. If you all could provide a reader to get these files into MET that would be great! I'm happy to provide a sample data set and any additional information that you may need.

  1. n: Observation number for indexing obs.
  2. ob: Observation value.
  3. bk_ob: Background (short-term forecast) value, interpolated in space and time to the observation location/time.
  4. t_bk_ob: Supplemental information required for assimilating the observation.
  5. xiv_ob: Innovation, or difference between the observation and the background value (e.g., ob - bk_ob)
  6. err_ob: Observation error standard deviation
  7. etc_ob: Supplemental information required for assimilating the observation.
  8. lat_ob: latitude of the observation.
  9. lon_ob: longitude of the observation.
  10. p_ob: pressure level of observation; except for radiances.
  11. jvarty: Jvarty_ob in the code, this is the variable type code.
  12. insty: Insty_ob in the code; instrument type code.
  13. nvp: Number of observations in the "profile"
  14. ichk: Quality indicator.
  15. idt: Time difference in seconds from the center of the time window. For a 6-hr time window, the values range from -10800 to +10800.
  16. c_pf_ob: Observation platform identifier (16 characters). For radiances, this includes sensor name (AMSU-A), channel number, ascending/descending flag and assimilate flag (A, M, R - for assimilate, monitor or reject). For radiosonde, it includes the block/station number.
  17. c_db_ob: More platform identifiers (10 characters). For radiances, this is includes the satellite name (NOAA15), and land/sea/ice flag. For radiosondes, it includes the retransmission status.
  18. idp: NAVDAS computes the pressure change over the assimilation window and stores here.
  19. lsi: Land/sea/ice flag
  20. rej: Reject flag
  21. bkerr: background error standard deviation, in observation space.
  22. cob: analysis solution, in observation space prior to being projected back into model space
  23. resid: residual in observation space, or H(analysis - background). A read statement would look like:
    C=textscan(fileID,
    '%*7d%8f%*1c%8f%*1c%*8f%*1c%8f%9f%*1c%*9f%*1c%9f%*1c%9f%*1c%11f%*1c%2d%*1c%3
    d%*1c%5d%*1c%*4d%*1c%7d%*2c%16c%*2c%10c%*1c%*4d%*1c%*2d%*1c%3d%*1c%7f%*1c%*7
    f%*1c%9f%*1c%9f%*[^\n]','headerlines',75,'whitespace','','delimiter','/n');
    fortran   = '7s 9s 1x 8s 1x 8s 1x 8s 9s 1x 9s 1x 9s 1x 9s 1x 11s ' +\
                 '1x 2s 1x 3s 1x 5s 1x 4s 1x 7s 2x 16s 2x 10s 1x 4s 1x 2s '
    +\
                '1x 3s 1x 7s 1x 7s 1x 9s 1x 9s'

    We now output h5 files instead of ascii. I put both formats and the python converter on the ftp directory. It would be great to have something like pb2nc, where we could choose what data types to output to the nc file via a config. Also having the innovation and residual information will be helpful for DA diagnostics.

Putting the obs values into ascii2nc will get us to the verification task and is where we likely need to start. Having matched pairs would also be useful.

There is additional information in the file that would be helpful to retain (you could treat it as an additional observation type) , in particular:

  1. xiv: the innovation value
  2. resid: residual in observation space
  3. the prescribed errors oberr and bkerr
  4. bk_ob the background in observation space
  5. the QC flags

If that information is available along with the ob value, this enables us to implement some DA diagnostics fairly easily.

JohnHalleyGotway commented 3 years ago

On 10/1/2020, Tara noted that the funding to pay for this must be spent by the end of 2020.

DanielAdriaansen commented 3 years ago

The prototype for this work exists in branch feature_635_nrlinnov. I decided to add the filtering capability in the Python embedding for now, via pandas. The pandas filters are configurable via [user_env_vars] in METplus, as is the mapping between NRL column names and MET 11-column names. Processing is somewhat slow, but these are very large files. The ASCII format takes ~ 3 minutes per file while the HDF5 files are about half of that or less (1-1.5 minutes per file) thus I would recommend using HDF5 input files whenever possible.