Enhance TC-Pairs to read hurricane model diagnostic files (e.g. SHIPS) and TC-Stat to filter the new data

dtcenter / MET

Model Evaluation Tools

https://dtcenter.org/community-code/model-evaluation-tools-met

Apache License 2.0

78 stars 24 forks source link

Enhance TC-Pairs to read hurricane model diagnostic files (e.g. SHIPS) and TC-Stat to filter the new data #392

Closed dwfncar closed 2 years ago

dwfncar commented 10 years ago

Describe the Enhancement

This task is requested by Paul Kucera and will be done to support the HFIP evaluations.

Currently, the MET-TC tools only handle hurricane ATCF files. This task is to add support for parsing/using hurricane model diagnostic files. The diagnostic files are ASCII and can be produced for each model/storm/initialization time. The diagnostic file contains track information for each lead time, just like the ATCF files, but also include many derived quantities, like shear and upper level quantities averaged over the storm area.

Enhance TC-Pairs to read TC diagnostics data in two different file formats:

The LS-Diag (Large Scale Diagnostics) format and the Model-Diag format
This data may have been generated by the new TC-Diag tool (see dtcenter/MET#2168) or by other software.

ACTION: After parsing this diagnostics data and finding a match for the model track currently being evaluated, need to decide how to write the diagnostics to the output.

Consider either tacking on the diagnostics data to the end of the existing TCMPR line type. Or consider writing a new TCDIAG line type containing only the diagnostics values. This decision should be driven by downstream needs of METdatadb and METviewer. What makes the most sense since the number and type of diagnostic values can change? We wouldn't have a fixed set of output columns. They would vary based on the user configuration of the TC-Diag tool.

Time Estimate

1 week

Sub-Issues

Consider breaking the enhancement down into sub-issues.

[ ] Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2770043

Define the Metadata

Assignee

[x] Select engineer(s) or no engineer required: @JohnHalleyGotway
[x] Select scientist(s) or no scientist required: @jvigh and @KathrynNewman

Labels

[x] Select component(s)
[x] Select priority
[x] Select requestor(s)

Projects and Milestone

[x] Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
[x] Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

[x] MET: dtcenter/MET#2168
[ ] METplus: List issue here
[ ] METdatadb: List issue here
[ ] METviewer: List issue here

Enhancement Checklist

See the METplus Workflow for details.

[x] Complete the issue definition above, including the Time Estimate and Funding Source.
[x] Fork this repository or create a branch of develop. Branch name: feature_<Issue Number>_<Description>
[x] Complete the development and test your changes.
[x] Add/update log messages for easier debugging.
[x] Add/update unit tests.
[x] Add/update documentation.
[x] Push local changes to GitHub.
[x] Submit a pull request to merge into develop. Pull request: feature <Issue Number> <Description>
[x] Define the pull request metadata, as permissions allow. Select: Reviewer(s) and Linked issues Select: Repository level development cycle Project for the next official release Select: Milestone as the next official version
[x] Iterate until the reviewer(s) accept and merge your changes.
[x] Delete your fork or branch.
[x] Close this issue.

This task is requested by Paul Kucera and will be done to support the HFIP evaluations.

Define new classes in the vx_tc_util library for reading this diagnostic data. Suggest including the ability to write the diagnostic data to a file. This could be a new line type, like TCMPR but maybe TCDIAG, for tropical cyclone diagnostics. The header columns would need to be the same as those in TCMPR.

The wrinkle here is that the contents of the diagnostic file can vary based on how the diagnostics were generated. So we wouldn't have a fixed set of output columns. Instead, after the header columns, the column names would indicate the contents.

(1) Suggest writing a new tool that just reads in a bunch of diagnostic files and writes them out to a single file. This would be a reformatting step and make it easier to process the diagnostic data.

(2) Suggest enhancing the tc_pairs tool to also read in the diagnostic files. For each ATCF track, look for a matching diagnostic track. Look in the tc_pairs config file at the user-specified list of diagnostic quantities to be included. Then tack those diagnostic quantities to the end of the line. Include support for short-cuts like "ALL" and partial string matching, like "p500" for 500mb quantities.

(3) Suggest enhancing the tc_stat tool to read the TCDIAG line type and filter the lines. Also enhance tc_stat to read the additional variable columns from the TC_MPR lines.

[MET-392] created by johnhg

dwfncar commented 6 years ago

This work was requested a second time by Paul Kucera and Mark Demaria at NHC. Funding will be allocated in 2018 for this. Here's some selections of emails about it...

From Paul...
The HFIP office said they have some funding to further develop MET-TC tools. In particular, NHC would like this feature added: stratify the track and intensity errors by vertical wind shear (and other variables like low-level humidity). Would you be able to give me a ball-park estimate of cost to add that feature? I will add that to my HFIP budget for this year.

From John...

Processing the gridded data files ourselves would open up a lot of flexibility... but would require that those gridded data files actually be present for each ATCF track ID being processed. It may also be slow to process gridded data for each forecast hour.

Using the diagnostic output would be faster in MET but would then require that the diagnostic tool be run on the gridded data sets. Perhaps we’d need to support that diagnostic tool to the broader community?

The real problem here is that the ATCF tracks do not contain enough diagnostic info... like wind shear. And there’s a few ways we could address that... from updating the output of the GFDL vortex tracker itself, to reading the existing diagnostic files, to generating those diagnostics directly in MET.

My first inclination would be to reimplement the diagnostic functionality as a new tool in MET (tc_diag)... to make it available and supportable. And then update tc_stat to read the output of both tc_pairs and tc_diag when doing its filtering. But let’s talk more about it next week.

Also, it would be really great to update METViewer to read/plot this track data as well, and replace the Rscripts we currently run!

From Paul...
I heard back from Mark. He suggested that we first use Kate's diagnostic files and/or the output from the operational SHIPS model. I am assuming that will be a bit more straightforward. by johnhg

dwfncar commented 5 years ago

Paul Kucera has identified funding for this work and would like it included after the met-8.1 release.
The funding for this ends July 2019, and the work could be coordinated with the tc-rmw and tc-gen enhancements.
Paul reiterated this on 2019/03/20. by johnhg

TaraJensen commented 3 years ago

Moved this into MET 10.1.0 development cycle during discussion with HAFS developers. They see this as a high priority.

jvigh commented 3 years ago

@TaraJensen @JohnHalleyGotway

Christopher Rozoff has now developed a more comprehensive suite of ensemble diagnostics which are similar to the SHIPS diagnostics. We are in the process of finalizing these. They will be in NetCDF format. We'd like to work with you to figure out what format they need to be in to be easily read by MET-TC (e.g., is CF-compliant sufficient?).

Paul's funding for his EnsRI/NHCvx projects will be ending Sep 30 (hard deadline). It might be possible to devote a bit of time to MET development -- we'd need to discuss with Paul.

JohnHalleyGotway commented 3 years ago

Had some discussions over email with @jvigh about this work. They are summarized below.

From Jonathan: The format is being finalized right now by Chris Rozoff. Our goal is to make it as close as possible to being CF-compliant as possible, noting that many of the quantities may not have standard names or units that are included in the CF-standards.

From John HG: While MET does currently read NetCDF files following the gridded CF-convention, it does not currently read any point observations following CF point convention. While CF is in wide usage for gridded data, it's less widely used for point data, to my knowledge. But there still may be some code that's reusable... like the parsing of date/time strings.

The TC-Pairs tool reads ATCF forecast track data and ATCF Best track data, and for each storm matches up their common track point. And for each fcst/Best track point pair, it computes along/cross/total track error and writes that to the output. The TC-Stat tool reads the TC-Pairs output, filters it in whatever way the user says, and then performs some analysis job on that data.

From Jonathan: They are not gridded data. All of the diagnostic values are "storm-centric", meaning they apply to the storm environment or some aspect of the storm structure. For example, a key parameter would be environmental vertical wind shear. There could actually be several of these, but the idea is the same -- they are a value that represent the diagnostic quantity at a point in time (just like track position or intensity). So the existing logic of tcpairs should work well for this. It may need to be enhanced, say, to allow one to stratify verification results by weak, moderate, or strong vertical shear. Another example would the the Maximum Potential Intensity (MPI) of a storm, etc.

From John HG: Great, thanks for clarifying. In my mind, we'd set it up so that...

the user specifies the source of one or more diagnostics files to be read on the tc_pairs command line
the user specifies the names of the diagnostic values in which they're interested in the tc_pairs configuration file
for each requested diagnostic, tc_pairs appends a new column of output to the existing tc_pairs output line type
for each track point, tc_pairs searches the diagnostic file(s) for the requested data and appends it to the end of each line
tc_stat is able to filter the output of tc_pairs by thresholding the values in the "extra" columns by name
- for example: -column_thresh VERTICAL_SHEAR >100
- that'd mean, only retain TCST lines from tc_pairs that have a column named VERTICAL_SHEAR at the end and that have a value > 100.

I think the main work will be writing code to parse the diagnostic info from NetCDF files into memory.

From Jonathan: I'm pretty sure there won't be 80-100 hours available for this. But there could possibly be 30-60 h, if Paul wants to go this route. If Tara had other funds to leverage, it might work (if you think it's really an 80+ h task).

JohnHalleyGotway commented 3 years ago

Please find sample diagnostic data from Chris attached. ida2021082700.tar.gz

JohnHalleyGotway commented 2 years ago

See related #2168 issue for the creation of the TC-Diag tool.

JohnHalleyGotway commented 2 years ago

On 10/3/22, Jonathan, Kathryn, and John HG met to discuss these details.

In general, the model-based diagnostics are defined separately for each output time.
For "forecast" tracks, these diagnostics are actually called predictors.
Lead time 0 is a special case for which additional predictors may exist that don't exist for lead time > 0 (e.g. IR predictors).

See sample lsdiag output on the NHC ftp site. The lsdiag output DOES NOT contain an ATCF TECH to define the model to which it corresponds. We need to define how to match diagnostics data to track data. We presume that NHC's lsdiag data corresponds to GFS output (i.e. AVNO? and not AVNI?... or GFSO/GFSI in MET output.

Need to configure TC-Pairs to read the right diagnostics data files for each model.

JohnHalleyGotway commented 2 years ago

Working on branch feature_392_tcdiag_line_type, propose adding a new TCDIAG output line type which immediately follows the TCMPR line to which it corresponds.

For example, this sample lsdiag file, contains a whopping 112 types of diagnostics.

Let TC-Pairs be configured to request as many of these as we'd like. Here's all 112:

diag_name = [
   "CSST", "DTL",  "RSST", "DSST", "DSTA", "CD20", "CD26", "COHC", "XDST", "XNST",
   "XOHC", "NSST", "NSTA", "NTMX", "NDTX", "NDML", "ND30", "ND28", "ND26", "ND24",
   "ND22", "ND20", "ND18", "ND16", "NDFR", "NTFR", "NOHC", "NO20", "IR00", "IRM1",
   "IRM3", "PC00", "PCM1", "PCM3", "U200", "U20C", "V20C", "E000", "EPOS", "ENEG",
   "EPSS", "ENSS", "RHLO", "RHMD", "RHHI", "PSLV", "Z850", "D200", "REFC", "PEFC",
   "T000", "R000", "Z000", "TLAT", "TLON", "TWAC", "TWXC", "G150", "G200", "G250",
   "V000", "V850", "V500", "V300", "TGRD", "TADV", "PENC", "SHDC", "SDDC", "SHGC",
   "DIVC", "T150", "T200", "T250", "SHRD", "SHTD", "SHRS", "SHTS", "SHRG", "PENV",
   "VMPI", "VVAV", "VMFX", "VVAC", "HE07", "HE05", "O500", "O700", "CFLX", "MTPW",
   "PW01", "PW02", "PW03", "PW04", "PW05", "PW06", "PW07", "PW08", "PW09", "PW10",
   "PW11", "PW12", "PW13", "PW14", "PW15", "PW16", "PW17", "PW18", "PW19", "PW20",
   "PW21", "IRXX" ];

That'd produce this sample output file alal2010.tcst.txt Note that the diag values are all NA since I'm working backwards... deciding what the output looks like before worrying about the inputs.

Each TCDIAG line contains 241 columns... some header columns followed by the requested diagnostic name and value. Of course, the user could configure TC-Pairs to include as many/few of these diagnostics in the output as they'd like.

@jvigh and @KathrynNewman, is this the design you had in mind... reporting diag name, diag value pairs?

KathrynNewman commented 2 years ago

@JohnHalleyGotway - thanks for providing a sample output file. I think this design would work. Starting with the 'level' column you'd list the number of output diagnostics followed by alternating diag_name and value. So, there is no specific header information for TCDIAG line type?

@KathrynNewman - basically yes, we wouldn't write a separate header line for the TCDIAG line type, at least in the .tcst file written by the TC-Pairs tool. I think its much more advantageous to interleave the TCMPR and corresponding TCDIAG lines together to make them easier to understand. While the alternative of writing the TCDIAG lines to a separate file would allow the inclusion of a header line, we'd lose the benefit of having the track and diagnostics data nicely paired.

JohnHalleyGotway commented 2 years ago

Progress on TC-Pairs in the feature_392_tcdiag_line_type branch. Its able to parse the diagnostics data specified using the new -lsdiag or -tcdiag command line options, match the diagnostics values to the track data, and write it to the output.

Here's a sample output file: feature_392_test.tcst.txt

I configured it to write all ~150 or so diagnostics from the sample sal092021_avno_doper_2021082712_diag.dat data file. That makes for a very long config file!

diag_name = [ <list of 150+ strings> ];

Elsewhere in MET we use an empty list to indicate that ALL available data should be used. Is that what we want here as well? Or does that produce too much output by default? We could also let diag_name contain regular expressions rather than doing exact string matching. That would give us more flexibility to be concise but may also confuse users.

Remaining tasks are:

Update documentation
Add at least one unit test to demonstrate
Work on TC-Stat parsing/filtering of the diagnostics

jvigh commented 2 years ago

@JohnHalleyGotway,

I've been digging into the lsdiag file contents much more deeply, as I have to write a NetCDF converter for another project (it's in NLC). I just wanted to provide you with the new web page for the SHIPS Dataset (since there is still an old one floating around). The new one is: https://rammb2.cira.colostate.edu/research/tropical-cyclones/ships/

Note that there is both a 5-day version and a 7-day version. While supporting the lsdiag/SHIPS Development Dataset are not primary deliverables in the funding we have, we should definitely aim to support the 7-day version. I note however that CIRA only provides 5-day files for the WPAC, North Indian Ocean, and Southern Hemisphere basins, so this could be an extra layer of complexity.

I'm guessing that users may want to use both the real-time lsdiag files, as well as the SHIPS Development Dataset. The two differ in that the development dataset includes quite a few more variables than the real-time lsdiag files.

Here's a Google Sheet comparing which parameters are in included in which (ignore the columns after the 2nd column -- I am using this to track my progress in updating my converter). https://docs.google.com/spreadsheets/d/1e8MgQbx-xGK_rwTtwZN7ulJzdh5xvz_r30z552pyZ7U/edit?pli=1#gid=0

In case you're wondering what these all are, here's the documentation: https://rammb.cira.colostate.edu/research/tropical_cyclones/ships/data/ships_predictor_file_2022.pdf

jvigh commented 2 years ago

Alright, so if you unpack all the variables in which they store a bunch of non-time-related parameters in the time rows, there are actually 288 variables. A bit much! I've added all of these subvariables into the spreadsheet now.

JohnHalleyGotway commented 2 years ago

@jvigh and @KathrynNewman, I'm working through the logic of TC-Stat filtering data by the diagnostics. I have a few questions:

I'm adding a -diag_thresh job command option similar to the -column_thresh option but for the diagnostics values. How should the -diag_thresh option work? Obviously, if a track point HAS diag data, we should only keep points where the diag value meets all specified threshold criteria. But what about track points that do NOT have diag data? Should those be kept or discarded?
TC-Stat also has the -init_thresh job command option. This is the same as -column_thresh but is applied to the track point for forecast hour 0. If the criteria met, the whole track is retained. If not, the whole track is discarded. Do we need a corresponding -init_diag_thresh job command option? Or would you not need to keep/discard entire tracks based on the diagnostic value at the initialization time?

KathrynNewman commented 2 years ago

@JohnHalleyGotway, here are my thoughts:

I'm leaning toward discard. If you want to apply the threshold to the diagnostic values, I think you would only want to include track points that include that information.
-init_diag_thresh job command option could be useful. I'd vote yes for this option. @jvigh, what do you think?

jvigh commented 2 years ago

@JohnHalleyGotway, I'm not sure if I understand enough about the normal behavior of TC-Stat to know fore sure how to apply it for this new tool, but I'll try. For 1, I think there could be cases in which it could go either way. But the normal uses would likely involve discarding points with no diag data.

Will there be the option to filter all the other diagnostics based on the value of one or more key diagnostics? If so, this could be really useful for people doing studies of rapid intensification. While MET-TC already provides logic for analyzing case-based RI (e.g., a 30 kt intensity increase in 24 h), some researchers are now using an event-based framing, in which the RI has a starting and stopping point that can be much longer than 24 h. For that, it'd be useful to let people define their own custom diagnostic and then filter out the forecasts/tracks based on these.

I vote yes as well.

JohnHalleyGotway commented 2 years ago

@KathrynNewman and @jvigh I need some help. While MET is parsing the lsdiag data, I'm comparing the LSDIAG reported lat/lon locations to the corresponding SHIP track data.

Based on this README file...

--> lsdiag
               DESCRIPTION: input data file for the SHIPS regression model.
               This data file does not provide any intesity onformation, it is an
               input file only.

I assume that the lsdiag data corresponds to the SHIP track data.

For example... LSDIAG file: https://ftp.nhc.noaa.gov/atcf/lsdiag/22091618AL0722_lsdiag.dat Has data for locations:

LEAD    0    6   12   18   24   30   36   42   48   54   60   66   72   78   84   90   96  102  108  114  120
LAT   163  165  166  169  171  173  174  176  178  181  184  188  192  197  202  208  214  219  224  228  232
LON   604  615  627  638  650  660  669  676  684  690  695  700  704  708  713  716  720  723  726  728  729

Extracting the SHIP track data from the corresponding ADeck file: https://ftp.nhc.noaa.gov/atcf/aid_public/aal072022.dat.gz I see:

AL, 07, 2022091618, 03, SHIP, 0, 163N, 603W,
AL, 07, 2022091618, 03, SHIP, 12, 166N, 626W,
AL, 07, 2022091618, 03, SHIP, 24, 171N, 650W,
AL, 07, 2022091618, 03, SHIP, 36, 174N, 668W,
AL, 07, 2022091618, 03, SHIP, 48, 178N, 683W,
AL, 07, 2022091618, 03, SHIP, 60, 184N, 695W,
AL, 07, 2022091618, 03, SHIP, 72, 192N, 703W,
AL, 07, 2022091618, 03, SHIP, 84, 202N, 712W,
AL, 07, 2022091618, 03, SHIP, 96, 214N, 720W,
AL, 07, 2022091618, 03, SHIP, 108, 224N, 726W,
AL, 07, 2022091618, 03, SHIP, 120, 232N, 728W,
AL, 07, 2022091618, 03, SHIP, 132, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 144, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 156, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 168, 0N, 0W,

Looking carefully, you'll find that the latitude values DO match, but the longitude values can differ by 0.1 degrees.

Right now MET prints a warning if the locations differ:

WARNING: TrackPoint::add_diag_value() -> the diagnostic location (16.3, -60.4) does not match the track location (16.3, -60.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (16.6, -62.7) does not match the track location (16.6, -62.6)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (17.4, -66.9) does not match the track location (17.4, -66.8)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (17.8, -68.4) does not match the track location (17.8, -68.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (19.2, -70.4) does not match the track location (19.2, -70.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (20.2, -71.3) does not match the track location (20.2, -71.2)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (23.2, -72.9) does not match the track location (23.2, -72.8)

What sort of error checking do you recommend? Any idea why the longitudes might differ?

jvigh commented 2 years ago

@JohnHalleyGotway - when you say these do not match, what do you mean? It looks like there is a match at 0 h:

This matches the t=0 CARQ lat/lon in the a-deck:

And the NHC official forecast track:

I think the lsdiag file lat/lons should match either the NHC official forecast track (in the case of the development dataset) or the underlying forecast track of the model it is being run on (e.g., the GFS deterministic forecast track for a GFS model diagnostics output, the HWRF forecast track for an HWRF model diagnostics output, etc.). But I'd like to get confirmation of that last point.

jvigh commented 2 years ago

@JohnHalleyGotway - Alright, I see what you're talking about. Let me send a quick email to John Knaff, Mark DeMaria, and Robert DeMaria.

JohnHalleyGotway commented 2 years ago

Additional development tasks defined while meeting 10/20/22:

[x] Add tcdiag_convert_map and lsdiag_convert_map config entries to the TC-Pairs config file.
- DONE on 10/21/22, but I realize the handling of 30 lsdiag rows requires special logic to be added.
- Refined on 10/22/22 to let tcdiag_convert_map contain both name and units strings.
[x] When the diag (lat, lon) and adeck (lat, lon) locations do not match a warning message is printed. But printing one for each diagnostic value is way too many. Refine logic to only print one warning per track point.
- DONE on 10/21/22, in one test this reduced from 1500+ warnings downs to 7 by only checking the locations for the first diagnostic processed rather than warning about each one!
[x] For -tcdiag and -lsdiag command line options, support setting model to a comma-separated list of ATCFID's (model=ATCFID1,ATCFID2,ATCFID3). Don't stop searching after the first match... keep searching.
- DONE on 10/21/22 and tested to confirm that setting -lsdiag path model=OFCL,SHIP,GFSO writes TCDIAG lines for all 3 tracks rather than just one.
[x] Running tc_stat with -diag_thresh BADNAME gt0 finds and thresholds bad data instead. Revise that logic. If a track point DOES CONTAIN diagnostics, but the requested diagnostic name is not present, decide what do to. Error out? Or just skip applying that threshold?
- DONE on 10/21/22. Updated tc_stat_job.h/.cc to print a single warning message about each diagnostic name specified in -diag_thresh or -init_diag_thresh that is not actually defined for at least one TCDIAG line. Keep a running list of the warnings printed and don't warn about the same diagnostic name multiple times.
```
WARNING: TCStatJob::get_diag_double() -> diagnostic "MAXWIND" not defined for TrackPoint.
```
[x] Refine logic to handle 30 lsdiag rows mentioned above.
- For now, just skip over the lsdiag diagnostics listed below since special handling is required.
```
static const char lsdiag_skip_str[] = "MTPW,IR00,IRM1,IRM3,PC00,PCM1,PCM3,PSLV,IRXX";
```
- Could revise this logic based on scientific direction.
[x] Refine tests in unit_tc_pairs.xml and unit_tc_stat.xml.
- DONE on 10/25/22. Updated these unit tests to process 4 initialization times (20220926 00, 06, 12, and 18) for AL092022 (Ian). We use -tcdiag and -lsdiag in the same call to tc_pairs. And tc_stat processes that single output file.
- Lingering question is why the tc_stat jobs are pretty slow for such little input? Could profile the code to see what's taking so long.
[ ] Determine the ATCFID(s) to which the lsdiag values should be attached (currently using SHIPS but that results in lots of warning messages.
- Mark DeMaria states that the lsdiag locations correspond to the BEST track locations.
- LSDIAG data sources:
  - Realtime on NHC site: https://ftp.nhc.noaa.gov/atcf/lsdiag/
  - Developmental CSU version: https://rammb2.cira.colostate.edu/research/tropical-cyclones/ships/
  - Question: Does the TC-Pairs logic need to differ based on input dataset?
[ ] Consider adding use_diag_track = TRUE/FALSE; config file option. If true, use the diag lat/lon location to define the track rather than matching to an existing ADECK track. This idea requires more thought. Would this be a benefit or just cause confusion We could add logic to bool TrackInfoArray::add_diag_data(DiagFile &diag_file, const StringArray &diag_name) for this.
Met on 10/25/22 and decided the following...
[x] John HG: Consolidate -tcdiag and -lsdiag down into a single -diag SOURCE ... command line option and support SOURCE = TCDIAG, LSDIAG_DEV, LSDIAG_RT. This will make it easier to add support for additional formats in the future (e.g. ENCORE and @jvigh's NetCDF version).
- DONE on 10/26/22. The single argument is now -diag SOURCE PATH for SOURCE = TCDIAG,LSDIAG_RT, LSDIAG_DEV. Although while LSDIAG_DEV is allowed, it produces a runtime error downstream stating that it's not supported yet.
[x] John HG: Replace tcdiag_convert_map and lsdiag_convert_map configuration options with a single diag_convert_map option to handle all input sources.
- DONE on 10/26/22.
[x] John HG: Add SOURCE column to the TCDIAG output line type.
- DONE on 10/26/22. Confirmed that tc_pairs writes it, tc_stat reads it and can write it back out.
[ ] John HG: Longer term, figure out how to efficiently process LSDIAG_DEV data files which have millions of lines for 20 years of data! Currently, tc_pairs parses/stores all the diag data and then tries to match it up to track data. That won't work! Instead, we'll need to loop over the tracks and search for matching LSDIAG_DEV data.
[ ] Jonathan: Coordinate with Mark to clarify the track which corresponds to the LAT/LON rows of data in the LSDIAG_DEV and LSDIAG_RT data sources.