dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Enhance TC-Pairs to read hurricane model diagnostic files (e.g. SHIPS) and TC-Stat to filter the new data #392

Closed dwfncar closed 2 years ago

dwfncar commented 10 years ago

Describe the Enhancement

This task is requested by Paul Kucera and will be done to support the HFIP evaluations.

Currently, the MET-TC tools only handle hurricane ATCF files. This task is to add support for parsing/using hurricane model diagnostic files. The diagnostic files are ASCII and can be produced for each model/storm/initialization time. The diagnostic file contains track information for each lead time, just like the ATCF files, but also include many derived quantities, like shear and upper level quantities averaged over the storm area.

Enhance TC-Pairs to read TC diagnostics data in two different file formats:

ACTION: After parsing this diagnostics data and finding a match for the model track currently being evaluated, need to decide how to write the diagnostics to the output.

Consider either tacking on the diagnostics data to the end of the existing TCMPR line type. Or consider writing a new TCDIAG line type containing only the diagnostics values. This decision should be driven by downstream needs of METdatadb and METviewer. What makes the most sense since the number and type of diagnostic values can change? We wouldn't have a fixed set of output columns. They would vary based on the user configuration of the TC-Diag tool.

Time Estimate

1 week

Sub-Issues

Consider breaking the enhancement down into sub-issues.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2770043

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

This task is requested by Paul Kucera and will be done to support the HFIP evaluations.


Currently, the MET-TC tools only handle hurricane ATCF files. This task is to add support for parsing/using hurricane model diagnostic files. The diagnostic files are ASCII and can be produced for each model/storm/initialization time. The diagnostic file contains track information for each lead time, just like the ATCF files, but also include many derived quantities, like shear and upper level quantities averaged over the storm area.


Define new classes in the vx_tc_util library for reading this diagnostic data. Suggest including the ability to write the diagnostic data to a file. This could be a new line type, like TCMPR but maybe TCDIAG, for tropical cyclone diagnostics. The header columns would need to be the same as those in TCMPR.


The wrinkle here is that the contents of the diagnostic file can vary based on how the diagnostics were generated. So we wouldn't have a fixed set of output columns. Instead, after the header columns, the column names would indicate the contents.


(1) Suggest writing a new tool that just reads in a bunch of diagnostic files and writes them out to a single file. This would be a reformatting step and make it easier to process the diagnostic data.


(2) Suggest enhancing the tc_pairs tool to also read in the diagnostic files. For each ATCF track, look for a matching diagnostic track. Look in the tc_pairs config file at the user-specified list of diagnostic quantities to be included. Then tack those diagnostic quantities to the end of the line. Include support for short-cuts like "ALL" and partial string matching, like "p500" for 500mb quantities.


(3) Suggest enhancing the tc_stat tool to read the TCDIAG line type and filter the lines. Also enhance tc_stat to read the additional variable columns from the TC_MPR lines.

[MET-392] created by johnhg

dwfncar commented 6 years ago

This work was requested a second time by Paul Kucera and Mark Demaria at NHC. Funding will be allocated in 2018 for this. Here's some selections of emails about it...


From Paul...
The HFIP office said they have some funding to further develop MET-TC tools. In particular, NHC would like this feature added: stratify the track and intensity errors by vertical wind shear (and other variables like low-level humidity). Would you be able to give me a ball-park estimate of cost to add that feature? I will add that to my HFIP budget for this year.


From John...


Processing the gridded data files ourselves would open up a lot of flexibility... but would require that those gridded data files actually be present for each ATCF track ID being processed. It may also be slow to process gridded data for each forecast hour.


Using the diagnostic output would be faster in MET but would then require that the diagnostic tool be run on the gridded data sets. Perhaps we’d need to support that diagnostic tool to the broader community?


The real problem here is that the ATCF tracks do not contain enough diagnostic info... like wind shear. And there’s a few ways we could address that... from updating the output of the GFDL vortex tracker itself, to reading the existing diagnostic files, to generating those diagnostics directly in MET.


My first inclination would be to reimplement the diagnostic functionality as a new tool in MET (tc_diag)... to make it available and supportable. And then update tc_stat to read the output of both tc_pairs and tc_diag when doing its filtering. But let’s talk more about it next week.


Also, it would be really great to update METViewer to read/plot this track data as well, and replace the Rscripts we currently run!


From Paul...
I heard back from Mark. He suggested that we first use Kate's diagnostic files and/or the output from the operational SHIPS model. I am assuming that will be a bit more straightforward. by johnhg

dwfncar commented 5 years ago

Paul Kucera has identified funding for this work and would like it included after the met-8.1 release.
The funding for this ends July 2019, and the work could be coordinated with the tc-rmw and tc-gen enhancements.
Paul reiterated this on 2019/03/20. by johnhg

TaraJensen commented 3 years ago

Moved this into MET 10.1.0 development cycle during discussion with HAFS developers. They see this as a high priority.

jvigh commented 3 years ago

@TaraJensen @JohnHalleyGotway

Christopher Rozoff has now developed a more comprehensive suite of ensemble diagnostics which are similar to the SHIPS diagnostics. We are in the process of finalizing these. They will be in NetCDF format. We'd like to work with you to figure out what format they need to be in to be easily read by MET-TC (e.g., is CF-compliant sufficient?).

Paul's funding for his EnsRI/NHCvx projects will be ending Sep 30 (hard deadline). It might be possible to devote a bit of time to MET development -- we'd need to discuss with Paul.

JohnHalleyGotway commented 3 years ago

Had some discussions over email with @jvigh about this work. They are summarized below.

From Jonathan: The format is being finalized right now by Chris Rozoff. Our goal is to make it as close as possible to being CF-compliant as possible, noting that many of the quantities may not have standard names or units that are included in the CF-standards.

From John HG: While MET does currently read NetCDF files following the gridded CF-convention, it does not currently read any point observations following CF point convention. While CF is in wide usage for gridded data, it's less widely used for point data, to my knowledge. But there still may be some code that's reusable... like the parsing of date/time strings.

The TC-Pairs tool reads ATCF forecast track data and ATCF Best track data, and for each storm matches up their common track point. And for each fcst/Best track point pair, it computes along/cross/total track error and writes that to the output. The TC-Stat tool reads the TC-Pairs output, filters it in whatever way the user says, and then performs some analysis job on that data.

From Jonathan: They are not gridded data. All of the diagnostic values are "storm-centric", meaning they apply to the storm environment or some aspect of the storm structure. For example, a key parameter would be environmental vertical wind shear. There could actually be several of these, but the idea is the same -- they are a value that represent the diagnostic quantity at a point in time (just like track position or intensity). So the existing logic of tcpairs should work well for this. It may need to be enhanced, say, to allow one to stratify verification results by weak, moderate, or strong vertical shear. Another example would the the Maximum Potential Intensity (MPI) of a storm, etc.

From John HG: Great, thanks for clarifying. In my mind, we'd set it up so that...

I think the main work will be writing code to parse the diagnostic info from NetCDF files into memory.

From Jonathan: I'm pretty sure there won't be 80-100 hours available for this. But there could possibly be 30-60 h, if Paul wants to go this route. If Tara had other funds to leverage, it might work (if you think it's really an 80+ h task).

JohnHalleyGotway commented 3 years ago

Please find sample diagnostic data from Chris attached. ida2021082700.tar.gz

JohnHalleyGotway commented 2 years ago

See related #2168 issue for the creation of the TC-Diag tool.

JohnHalleyGotway commented 2 years ago

On 10/3/22, Jonathan, Kathryn, and John HG met to discuss these details.

See sample lsdiag output on the NHC ftp site. The lsdiag output DOES NOT contain an ATCF TECH to define the model to which it corresponds. We need to define how to match diagnostics data to track data. We presume that NHC's lsdiag data corresponds to GFS output (i.e. AVNO? and not AVNI?... or GFSO/GFSI in MET output.

Need to configure TC-Pairs to read the right diagnostics data files for each model.

JohnHalleyGotway commented 2 years ago

Working on branch feature_392_tcdiag_line_type, propose adding a new TCDIAG output line type which immediately follows the TCMPR line to which it corresponds.

For example, this sample lsdiag file, contains a whopping 112 types of diagnostics.

Let TC-Pairs be configured to request as many of these as we'd like. Here's all 112:

diag_name = [
   "CSST", "DTL",  "RSST", "DSST", "DSTA", "CD20", "CD26", "COHC", "XDST", "XNST",
   "XOHC", "NSST", "NSTA", "NTMX", "NDTX", "NDML", "ND30", "ND28", "ND26", "ND24",
   "ND22", "ND20", "ND18", "ND16", "NDFR", "NTFR", "NOHC", "NO20", "IR00", "IRM1",
   "IRM3", "PC00", "PCM1", "PCM3", "U200", "U20C", "V20C", "E000", "EPOS", "ENEG",
   "EPSS", "ENSS", "RHLO", "RHMD", "RHHI", "PSLV", "Z850", "D200", "REFC", "PEFC",
   "T000", "R000", "Z000", "TLAT", "TLON", "TWAC", "TWXC", "G150", "G200", "G250",
   "V000", "V850", "V500", "V300", "TGRD", "TADV", "PENC", "SHDC", "SDDC", "SHGC",
   "DIVC", "T150", "T200", "T250", "SHRD", "SHTD", "SHRS", "SHTS", "SHRG", "PENV",
   "VMPI", "VVAV", "VMFX", "VVAC", "HE07", "HE05", "O500", "O700", "CFLX", "MTPW",
   "PW01", "PW02", "PW03", "PW04", "PW05", "PW06", "PW07", "PW08", "PW09", "PW10",
   "PW11", "PW12", "PW13", "PW14", "PW15", "PW16", "PW17", "PW18", "PW19", "PW20",
   "PW21", "IRXX" ];

That'd produce this sample output file alal2010.tcst.txt Note that the diag values are all NA since I'm working backwards... deciding what the output looks like before worrying about the inputs.

Each TCDIAG line contains 241 columns... some header columns followed by the requested diagnostic name and value. Of course, the user could configure TC-Pairs to include as many/few of these diagnostics in the output as they'd like.

@jvigh and @KathrynNewman, is this the design you had in mind... reporting diag name, diag value pairs?

KathrynNewman commented 2 years ago

@JohnHalleyGotway - thanks for providing a sample output file. I think this design would work. Starting with the 'level' column you'd list the number of output diagnostics followed by alternating diag_name and value. So, there is no specific header information for TCDIAG line type?

@KathrynNewman - basically yes, we wouldn't write a separate header line for the TCDIAG line type, at least in the .tcst file written by the TC-Pairs tool. I think its much more advantageous to interleave the TCMPR and corresponding TCDIAG lines together to make them easier to understand. While the alternative of writing the TCDIAG lines to a separate file would allow the inclusion of a header line, we'd lose the benefit of having the track and diagnostics data nicely paired.

JohnHalleyGotway commented 2 years ago

Progress on TC-Pairs in the feature_392_tcdiag_line_type branch. Its able to parse the diagnostics data specified using the new -lsdiag or -tcdiag command line options, match the diagnostics values to the track data, and write it to the output.

Here's a sample output file: feature_392_test.tcst.txt

I configured it to write all ~150 or so diagnostics from the sample sal092021_avno_doper_2021082712_diag.dat data file. That makes for a very long config file!

diag_name = [ <list of 150+ strings> ];

Elsewhere in MET we use an empty list to indicate that ALL available data should be used. Is that what we want here as well? Or does that produce too much output by default? We could also let diag_name contain regular expressions rather than doing exact string matching. That would give us more flexibility to be concise but may also confuse users.

Remaining tasks are:

jvigh commented 2 years ago

@JohnHalleyGotway,

I've been digging into the lsdiag file contents much more deeply, as I have to write a NetCDF converter for another project (it's in NLC). I just wanted to provide you with the new web page for the SHIPS Dataset (since there is still an old one floating around). The new one is: https://rammb2.cira.colostate.edu/research/tropical-cyclones/ships/

Note that there is both a 5-day version and a 7-day version. While supporting the lsdiag/SHIPS Development Dataset are not primary deliverables in the funding we have, we should definitely aim to support the 7-day version. I note however that CIRA only provides 5-day files for the WPAC, North Indian Ocean, and Southern Hemisphere basins, so this could be an extra layer of complexity.

I'm guessing that users may want to use both the real-time lsdiag files, as well as the SHIPS Development Dataset. The two differ in that the development dataset includes quite a few more variables than the real-time lsdiag files.

Here's a Google Sheet comparing which parameters are in included in which (ignore the columns after the 2nd column -- I am using this to track my progress in updating my converter). https://docs.google.com/spreadsheets/d/1e8MgQbx-xGK_rwTtwZN7ulJzdh5xvz_r30z552pyZ7U/edit?pli=1#gid=0

In case you're wondering what these all are, here's the documentation: https://rammb.cira.colostate.edu/research/tropical_cyclones/ships/data/ships_predictor_file_2022.pdf

jvigh commented 2 years ago

Alright, so if you unpack all the variables in which they store a bunch of non-time-related parameters in the time rows, there are actually 288 variables. A bit much! I've added all of these subvariables into the spreadsheet now.

JohnHalleyGotway commented 2 years ago

@jvigh and @KathrynNewman, I'm working through the logic of TC-Stat filtering data by the diagnostics. I have a few questions:

  1. I'm adding a -diag_thresh job command option similar to the -column_thresh option but for the diagnostics values. How should the -diag_thresh option work? Obviously, if a track point HAS diag data, we should only keep points where the diag value meets all specified threshold criteria. But what about track points that do NOT have diag data? Should those be kept or discarded?

  2. TC-Stat also has the -init_thresh job command option. This is the same as -column_thresh but is applied to the track point for forecast hour 0. If the criteria met, the whole track is retained. If not, the whole track is discarded. Do we need a corresponding -init_diag_thresh job command option? Or would you not need to keep/discard entire tracks based on the diagnostic value at the initialization time?

KathrynNewman commented 2 years ago

@JohnHalleyGotway, here are my thoughts:

  1. I'm leaning toward discard. If you want to apply the threshold to the diagnostic values, I think you would only want to include track points that include that information.
  2. -init_diag_thresh job command option could be useful. I'd vote yes for this option. @jvigh, what do you think?
jvigh commented 2 years ago

@JohnHalleyGotway, I'm not sure if I understand enough about the normal behavior of TC-Stat to know fore sure how to apply it for this new tool, but I'll try. For 1, I think there could be cases in which it could go either way. But the normal uses would likely involve discarding points with no diag data.  

Will there be the option to filter all the other diagnostics based on the value of one or more key diagnostics? If so, this could be really useful for people doing studies of rapid intensification. While MET-TC already provides logic for analyzing case-based RI (e.g., a 30 kt intensity increase in 24 h), some researchers are now using an event-based framing, in which the RI has a starting and stopping point that can be much longer than 24 h. For that, it'd be useful to let people define their own custom diagnostic and then filter out the forecasts/tracks based on these.

  1. I vote yes as well.
JohnHalleyGotway commented 2 years ago

@KathrynNewman and @jvigh I need some help. While MET is parsing the lsdiag data, I'm comparing the LSDIAG reported lat/lon locations to the corresponding SHIP track data.

Based on this README file...

--> lsdiag
               DESCRIPTION: input data file for the SHIPS regression model.
               This data file does not provide any intesity onformation, it is an
               input file only.

I assume that the lsdiag data corresponds to the SHIP track data.

For example... LSDIAG file: https://ftp.nhc.noaa.gov/atcf/lsdiag/22091618AL0722_lsdiag.dat Has data for locations:

LEAD    0    6   12   18   24   30   36   42   48   54   60   66   72   78   84   90   96  102  108  114  120
LAT   163  165  166  169  171  173  174  176  178  181  184  188  192  197  202  208  214  219  224  228  232
LON   604  615  627  638  650  660  669  676  684  690  695  700  704  708  713  716  720  723  726  728  729

Extracting the SHIP track data from the corresponding ADeck file: https://ftp.nhc.noaa.gov/atcf/aid_public/aal072022.dat.gz I see:

AL, 07, 2022091618, 03, SHIP, 0, 163N, 603W,
AL, 07, 2022091618, 03, SHIP, 12, 166N, 626W,
AL, 07, 2022091618, 03, SHIP, 24, 171N, 650W,
AL, 07, 2022091618, 03, SHIP, 36, 174N, 668W,
AL, 07, 2022091618, 03, SHIP, 48, 178N, 683W,
AL, 07, 2022091618, 03, SHIP, 60, 184N, 695W,
AL, 07, 2022091618, 03, SHIP, 72, 192N, 703W,
AL, 07, 2022091618, 03, SHIP, 84, 202N, 712W,
AL, 07, 2022091618, 03, SHIP, 96, 214N, 720W,
AL, 07, 2022091618, 03, SHIP, 108, 224N, 726W,
AL, 07, 2022091618, 03, SHIP, 120, 232N, 728W,
AL, 07, 2022091618, 03, SHIP, 132, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 144, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 156, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 168, 0N, 0W,

Looking carefully, you'll find that the latitude values DO match, but the longitude values can differ by 0.1 degrees.

Right now MET prints a warning if the locations differ:

WARNING: TrackPoint::add_diag_value() -> the diagnostic location (16.3, -60.4) does not match the track location (16.3, -60.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (16.6, -62.7) does not match the track location (16.6, -62.6)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (17.4, -66.9) does not match the track location (17.4, -66.8)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (17.8, -68.4) does not match the track location (17.8, -68.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (19.2, -70.4) does not match the track location (19.2, -70.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (20.2, -71.3) does not match the track location (20.2, -71.2)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (23.2, -72.9) does not match the track location (23.2, -72.8)

What sort of error checking do you recommend? Any idea why the longitudes might differ?

jvigh commented 2 years ago

@JohnHalleyGotway - when you say these do not match, what do you mean? It looks like there is a match at 0 h: image

This matches the t=0 CARQ lat/lon in the a-deck: image

And the NHC official forecast track: image

I think the lsdiag file lat/lons should match either the NHC official forecast track (in the case of the development dataset) or the underlying forecast track of the model it is being run on (e.g., the GFS deterministic forecast track for a GFS model diagnostics output, the HWRF forecast track for an HWRF model diagnostics output, etc.). But I'd like to get confirmation of that last point.

jvigh commented 2 years ago

@JohnHalleyGotway - Alright, I see what you're talking about. Let me send a quick email to John Knaff, Mark DeMaria, and Robert DeMaria.

JohnHalleyGotway commented 2 years ago

Additional development tasks defined while meeting 10/20/22: