Add new use case of python embedding for point observations with NRL innovation files.

Describe the New Feature

This task is to create a python script to read NRL innovation files and use python embedding to pass them into the ascii2nc tool. Once that script works well, collaborate with NRL staff to develop a new METplus use case which calls ascii2nc to prepare the point observations and then either Point-Stat or Ensemble-Stat to verify model output against them. Need to get direction from Liz Satterfield for direction on these details.

See details about the formatting of the NRL innovation files as a comment on this issue.

Consider breaking down the multiple steps for this task into sub-issues: (1) Write and test a python embedding script to process the input file with ascii2nc. (2) See comments for a description of how Liz would like (1) to be configurable. Should filtering logic be added to the python-embedding script or to the ascii2nc tool? (3) Gather sample model data which should be verified using these point observations and develop of METplus use case for these steps. (4) This data may also be suitable for python embedding in Stat-Analysis to compute statistics directly using the obs and innovation values. Explore this option and potentially include it in the use case as well.

Acceptance Testing

Input NRL innovations files can be found in kiowa:/d1/projects/METplus/METplus_Development/feature_635/innovation_data Contact Liz Satterfield for the model data to be used in use case development.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

[ ] Add a checkbox for each sub-issue here.

Relevant Deadlines

METplus-4.0 release

Funding Source

2700021

Define the Metadata

Assignee

[ ] Select engineer(s) or no engineer required
[ ] Select scientist(s) or no scientist required

Labels

[X] Select component(s)
[X] Select priority
[X] Select requestor(s)

Projects and Milestone

[X] Review projects and select relevant Repository and Organization ones or add "alert:NEED PROJECT ASSIGNMENT" label
[X] Select milestone to next major version milestone or "Future Versions"

Define Related Issue(s)

Consider the impact to the other METplus components.

[X] METplus, MET, METdatadb, METviewer, METexpress, METcalcpy, METplotpy No know impacts.

New Feature Checklist

See the METplus Workflow for details.

[ ] Complete the issue definition above, including the Time Estimate and Funding source.
[ ] Fork this repository or create a branch of develop. Branch name: feature_<Issue Number>_<Description>
[ ] Complete the development and test your changes.
[ ] Add/update log messages for easier debugging.
[ ] Add/update unit tests.
[ ] Add/update documentation.
[ ] Push local changes to GitHub.
[ ] Submit a pull request to merge into develop. Pull request: feature <Issue Number> <Description>
[ ] Define the pull request metadata, as permissions allow. Select: Reviewer(s), Project(s), Milestone, and Linked issues
[ ] Iterate until the reviewer(s) accept and merge your changes.
[ ] Delete your fork or branch.
[ ] Close this issue.

Email from Liz on 9/23/2020: The innovation files are ascii files with a 75 line header. The column description is below. If you all could provide a reader to get these files into MET that would be great! I'm happy to provide a sample data set and any additional information that you may need.

n: Observation number for indexing obs.
ob: Observation value.
bk_ob: Background (short-term forecast) value, interpolated in space and time to the observation location/time.
t_bk_ob: Supplemental information required for assimilating the observation.
xiv_ob: Innovation, or difference between the observation and the background value (e.g., ob - bk_ob)
err_ob: Observation error standard deviation
etc_ob: Supplemental information required for assimilating the observation.
lat_ob: latitude of the observation.
lon_ob: longitude of the observation.
p_ob: pressure level of observation; except for radiances.
jvarty: Jvarty_ob in the code, this is the variable type code.
insty: Insty_ob in the code; instrument type code.
nvp: Number of observations in the "profile"
ichk: Quality indicator.
idt: Time difference in seconds from the center of the time window. For a 6-hr time window, the values range from -10800 to +10800.
c_pf_ob: Observation platform identifier (16 characters). For radiances, this includes sensor name (AMSU-A), channel number, ascending/descending flag and assimilate flag (A, M, R - for assimilate, monitor or reject). For radiosonde, it includes the block/station number.
c_db_ob: More platform identifiers (10 characters). For radiances, this is includes the satellite name (NOAA15), and land/sea/ice flag. For radiosondes, it includes the retransmission status.
idp: NAVDAS computes the pressure change over the assimilation window and stores here.
lsi: Land/sea/ice flag
rej: Reject flag
bkerr: background error standard deviation, in observation space.
cob: analysis solution, in observation space prior to being projected back into model space

resid: residual in observation space, or H(analysis - background). A read statement would look like:

C=textscan(fileID,
'%*7d%8f%*1c%8f%*1c%*8f%*1c%8f%9f%*1c%*9f%*1c%9f%*1c%9f%*1c%11f%*1c%2d%*1c%3
d%*1c%5d%*1c%*4d%*1c%7d%*2c%16c%*2c%10c%*1c%*4d%*1c%*2d%*1c%3d%*1c%7f%*1c%*7
f%*1c%9f%*1c%9f%*[^\n]','headerlines',75,'whitespace','','delimiter','/n');
fortran   = '7s 9s 1x 8s 1x 8s 1x 8s 9s 1x 9s 1x 9s 1x 9s 1x 11s ' +\
             '1x 2s 1x 3s 1x 5s 1x 4s 1x 7s 2x 16s 2x 10s 1x 4s 1x 2s '
+\
            '1x 3s 1x 7s 1x 7s 1x 9s 1x 9s'

We now output h5 files instead of ascii. I put both formats and the python converter on the ftp directory. It would be great to have something like pb2nc, where we could choose what data types to output to the nc file via a config. Also having the innovation and residual information will be helpful for DA diagnostics.

Putting the obs values into ascii2nc will get us to the verification task and is where we likely need to start. Having matched pairs would also be useful.

There is additional information in the file that would be helpful to retain (you could treat it as an additional observation type) , in particular:

xiv: the innovation value
resid: residual in observation space
the prescribed errors oberr and bkerr
bk_ob the background in observation space
the QC flags

If that information is available along with the ob value, this enables us to implement some DA diagnostics fairly easily.

On 10/1/2020, Tara noted that the funding to pay for this must be spent by the end of 2020.

The prototype for this work exists in branch feature_635_nrlinnov. I decided to add the filtering capability in the Python embedding for now, via pandas. The pandas filters are configurable via [user_env_vars] in METplus, as is the mapping between NRL column names and MET 11-column names. Processing is somewhat slow, but these are very large files. The ASCII format takes ~ 3 minutes per file while the HDF5 files are about half of that or less (1-1.5 minutes per file) thus I would recommend using HDF5 input files whenever possible.

dtcenter / METplus