Closed dwfncar closed 2 years ago
This work was requested a second time by Paul Kucera and Mark Demaria at NHC. Funding will be allocated in 2018 for this. Here's some selections of emails about it...
From Paul...
The HFIP office said they have some funding to further develop MET-TC tools. In particular, NHC would like this feature added: stratify the track and intensity errors by vertical wind shear (and other variables like low-level humidity). Would you be able to give me a ball-park estimate of cost to add that feature? I will add that to my HFIP budget for this year.
From John...
Processing the gridded data files ourselves would open up a lot of flexibility... but would require that those gridded data files actually be present for each ATCF track ID being processed. It may also be slow to process gridded data for each forecast hour.
Using the diagnostic output would be faster in MET but would then require that the diagnostic tool be run on the gridded data sets. Perhaps we’d need to support that diagnostic tool to the broader community?
The real problem here is that the ATCF tracks do not contain enough diagnostic info... like wind shear. And there’s a few ways we could address that... from updating the output of the GFDL vortex tracker itself, to reading the existing diagnostic files, to generating those diagnostics directly in MET.
My first inclination would be to reimplement the diagnostic functionality as a new tool in MET (tc_diag)... to make it available and supportable. And then update tc_stat to read the output of both tc_pairs and tc_diag when doing its filtering. But let’s talk more about it next week.
Also, it would be really great to update METViewer to read/plot this track data as well, and replace the Rscripts we currently run!
From Paul...
I heard back from Mark. He suggested that we first use Kate's diagnostic files and/or the output from the operational SHIPS model. I am assuming that will be a bit more straightforward.
by johnhg
Paul Kucera has identified funding for this work and would like it included after the met-8.1 release.
The funding for this ends July 2019, and the work could be coordinated with the tc-rmw and tc-gen enhancements.
Paul reiterated this on 2019/03/20.
by johnhg
Moved this into MET 10.1.0 development cycle during discussion with HAFS developers. They see this as a high priority.
@TaraJensen @JohnHalleyGotway
Christopher Rozoff has now developed a more comprehensive suite of ensemble diagnostics which are similar to the SHIPS diagnostics. We are in the process of finalizing these. They will be in NetCDF format. We'd like to work with you to figure out what format they need to be in to be easily read by MET-TC (e.g., is CF-compliant sufficient?).
Paul's funding for his EnsRI/NHCvx projects will be ending Sep 30 (hard deadline). It might be possible to devote a bit of time to MET development -- we'd need to discuss with Paul.
Had some discussions over email with @jvigh about this work. They are summarized below.
From Jonathan: The format is being finalized right now by Chris Rozoff. Our goal is to make it as close as possible to being CF-compliant as possible, noting that many of the quantities may not have standard names or units that are included in the CF-standards.
From John HG: While MET does currently read NetCDF files following the gridded CF-convention, it does not currently read any point observations following CF point convention. While CF is in wide usage for gridded data, it's less widely used for point data, to my knowledge. But there still may be some code that's reusable... like the parsing of date/time strings.
The TC-Pairs tool reads ATCF forecast track data and ATCF Best track data, and for each storm matches up their common track point. And for each fcst/Best track point pair, it computes along/cross/total track error and writes that to the output. The TC-Stat tool reads the TC-Pairs output, filters it in whatever way the user says, and then performs some analysis job on that data.
From Jonathan: They are not gridded data. All of the diagnostic values are "storm-centric", meaning they apply to the storm environment or some aspect of the storm structure. For example, a key parameter would be environmental vertical wind shear. There could actually be several of these, but the idea is the same -- they are a value that represent the diagnostic quantity at a point in time (just like track position or intensity). So the existing logic of tcpairs should work well for this. It may need to be enhanced, say, to allow one to stratify verification results by weak, moderate, or strong vertical shear. Another example would the the Maximum Potential Intensity (MPI) of a storm, etc.
From John HG: Great, thanks for clarifying. In my mind, we'd set it up so that...
I think the main work will be writing code to parse the diagnostic info from NetCDF files into memory.
From Jonathan: I'm pretty sure there won't be 80-100 hours available for this. But there could possibly be 30-60 h, if Paul wants to go this route. If Tara had other funds to leverage, it might work (if you think it's really an 80+ h task).
Please find sample diagnostic data from Chris attached. ida2021082700.tar.gz
See related #2168 issue for the creation of the TC-Diag tool.
On 10/3/22, Jonathan, Kathryn, and John HG met to discuss these details.
See sample lsdiag
output on the NHC ftp site. The lsdiag output DOES NOT contain an ATCF TECH to define the model to which it corresponds. We need to define how to match diagnostics data to track data. We presume that NHC's lsdiag data corresponds to GFS output (i.e. AVNO? and not AVNI?... or GFSO/GFSI in MET output.
Need to configure TC-Pairs to read the right diagnostics data files for each model.
Working on branch feature_392_tcdiag_line_type
, propose adding a new TCDIAG output line type which immediately follows the TCMPR line to which it corresponds.
For example, this sample lsdiag file, contains a whopping 112 types of diagnostics.
Let TC-Pairs be configured to request as many of these as we'd like. Here's all 112:
diag_name = [
"CSST", "DTL", "RSST", "DSST", "DSTA", "CD20", "CD26", "COHC", "XDST", "XNST",
"XOHC", "NSST", "NSTA", "NTMX", "NDTX", "NDML", "ND30", "ND28", "ND26", "ND24",
"ND22", "ND20", "ND18", "ND16", "NDFR", "NTFR", "NOHC", "NO20", "IR00", "IRM1",
"IRM3", "PC00", "PCM1", "PCM3", "U200", "U20C", "V20C", "E000", "EPOS", "ENEG",
"EPSS", "ENSS", "RHLO", "RHMD", "RHHI", "PSLV", "Z850", "D200", "REFC", "PEFC",
"T000", "R000", "Z000", "TLAT", "TLON", "TWAC", "TWXC", "G150", "G200", "G250",
"V000", "V850", "V500", "V300", "TGRD", "TADV", "PENC", "SHDC", "SDDC", "SHGC",
"DIVC", "T150", "T200", "T250", "SHRD", "SHTD", "SHRS", "SHTS", "SHRG", "PENV",
"VMPI", "VVAV", "VMFX", "VVAC", "HE07", "HE05", "O500", "O700", "CFLX", "MTPW",
"PW01", "PW02", "PW03", "PW04", "PW05", "PW06", "PW07", "PW08", "PW09", "PW10",
"PW11", "PW12", "PW13", "PW14", "PW15", "PW16", "PW17", "PW18", "PW19", "PW20",
"PW21", "IRXX" ];
That'd produce this sample output file alal2010.tcst.txt Note that the diag values are all NA since I'm working backwards... deciding what the output looks like before worrying about the inputs.
Each TCDIAG line contains 241 columns... some header columns followed by the requested diagnostic name and value. Of course, the user could configure TC-Pairs to include as many/few of these diagnostics in the output as they'd like.
@jvigh and @KathrynNewman, is this the design you had in mind... reporting diag name, diag value pairs?
@JohnHalleyGotway - thanks for providing a sample output file. I think this design would work. Starting with the 'level' column you'd list the number of output diagnostics followed by alternating diag_name and value. So, there is no specific header information for TCDIAG line type?
@KathrynNewman - basically yes, we wouldn't write a separate header line for the TCDIAG line type, at least in the .tcst
file written by the TC-Pairs tool. I think its much more advantageous to interleave the TCMPR and corresponding TCDIAG lines together to make them easier to understand. While the alternative of writing the TCDIAG lines to a separate file would allow the inclusion of a header line, we'd lose the benefit of having the track and diagnostics data nicely paired.
Progress on TC-Pairs in the feature_392_tcdiag_line_type
branch. Its able to parse the diagnostics data specified using the new -lsdiag
or -tcdiag
command line options, match the diagnostics values to the track data, and write it to the output.
Here's a sample output file: feature_392_test.tcst.txt
I configured it to write all ~150 or so diagnostics from the sample sal092021_avno_doper_2021082712_diag.dat
data file. That makes for a very long config file!
diag_name = [ <list of 150+ strings> ];
Elsewhere in MET we use an empty list to indicate that ALL available data should be used. Is that what we want here as well? Or does that produce too much output by default?
We could also let diag_name
contain regular expressions rather than doing exact string matching. That would give us more flexibility to be concise but may also confuse users.
Remaining tasks are:
@JohnHalleyGotway,
I've been digging into the lsdiag file contents much more deeply, as I have to write a NetCDF converter for another project (it's in NLC). I just wanted to provide you with the new web page for the SHIPS Dataset (since there is still an old one floating around). The new one is: https://rammb2.cira.colostate.edu/research/tropical-cyclones/ships/
Note that there is both a 5-day version and a 7-day version. While supporting the lsdiag/SHIPS Development Dataset are not primary deliverables in the funding we have, we should definitely aim to support the 7-day version. I note however that CIRA only provides 5-day files for the WPAC, North Indian Ocean, and Southern Hemisphere basins, so this could be an extra layer of complexity.
I'm guessing that users may want to use both the real-time lsdiag files, as well as the SHIPS Development Dataset. The two differ in that the development dataset includes quite a few more variables than the real-time lsdiag files.
Here's a Google Sheet comparing which parameters are in included in which (ignore the columns after the 2nd column -- I am using this to track my progress in updating my converter). https://docs.google.com/spreadsheets/d/1e8MgQbx-xGK_rwTtwZN7ulJzdh5xvz_r30z552pyZ7U/edit?pli=1#gid=0
In case you're wondering what these all are, here's the documentation: https://rammb.cira.colostate.edu/research/tropical_cyclones/ships/data/ships_predictor_file_2022.pdf
Alright, so if you unpack all the variables in which they store a bunch of non-time-related parameters in the time rows, there are actually 288 variables. A bit much! I've added all of these subvariables into the spreadsheet now.
@jvigh and @KathrynNewman, I'm working through the logic of TC-Stat filtering data by the diagnostics. I have a few questions:
I'm adding a -diag_thresh
job command option similar to the -column_thresh
option but for the diagnostics values. How should the -diag_thresh
option work? Obviously, if a track point HAS diag data, we should only keep points where the diag value meets all specified threshold criteria. But what about track points that do NOT have diag data? Should those be kept or discarded?
TC-Stat also has the -init_thresh
job command option. This is the same as -column_thresh
but is applied to the track point for forecast hour 0. If the criteria met, the whole track is retained. If not, the whole track is discarded. Do we need a corresponding -init_diag_thresh
job command option? Or would you not need to keep/discard entire tracks based on the diagnostic value at the initialization time?
@JohnHalleyGotway, here are my thoughts:
@JohnHalleyGotway, I'm not sure if I understand enough about the normal behavior of TC-Stat to know fore sure how to apply it for this new tool, but I'll try. For 1, I think there could be cases in which it could go either way. But the normal uses would likely involve discarding points with no diag data.
Will there be the option to filter all the other diagnostics based on the value of one or more key diagnostics? If so, this could be really useful for people doing studies of rapid intensification. While MET-TC already provides logic for analyzing case-based RI (e.g., a 30 kt intensity increase in 24 h), some researchers are now using an event-based framing, in which the RI has a starting and stopping point that can be much longer than 24 h. For that, it'd be useful to let people define their own custom diagnostic and then filter out the forecasts/tracks based on these.
@KathrynNewman and @jvigh I need some help. While MET is parsing the lsdiag data, I'm comparing the LSDIAG reported lat/lon locations to the corresponding SHIP track data.
Based on this README file...
--> lsdiag
DESCRIPTION: input data file for the SHIPS regression model.
This data file does not provide any intesity onformation, it is an
input file only.
I assume that the lsdiag data corresponds to the SHIP track data.
For example... LSDIAG file: https://ftp.nhc.noaa.gov/atcf/lsdiag/22091618AL0722_lsdiag.dat Has data for locations:
LEAD 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120
LAT 163 165 166 169 171 173 174 176 178 181 184 188 192 197 202 208 214 219 224 228 232
LON 604 615 627 638 650 660 669 676 684 690 695 700 704 708 713 716 720 723 726 728 729
Extracting the SHIP track data from the corresponding ADeck file: https://ftp.nhc.noaa.gov/atcf/aid_public/aal072022.dat.gz I see:
AL, 07, 2022091618, 03, SHIP, 0, 163N, 603W,
AL, 07, 2022091618, 03, SHIP, 12, 166N, 626W,
AL, 07, 2022091618, 03, SHIP, 24, 171N, 650W,
AL, 07, 2022091618, 03, SHIP, 36, 174N, 668W,
AL, 07, 2022091618, 03, SHIP, 48, 178N, 683W,
AL, 07, 2022091618, 03, SHIP, 60, 184N, 695W,
AL, 07, 2022091618, 03, SHIP, 72, 192N, 703W,
AL, 07, 2022091618, 03, SHIP, 84, 202N, 712W,
AL, 07, 2022091618, 03, SHIP, 96, 214N, 720W,
AL, 07, 2022091618, 03, SHIP, 108, 224N, 726W,
AL, 07, 2022091618, 03, SHIP, 120, 232N, 728W,
AL, 07, 2022091618, 03, SHIP, 132, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 144, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 156, 0N, 0W,
AL, 07, 2022091618, 03, SHIP, 168, 0N, 0W,
Looking carefully, you'll find that the latitude values DO match, but the longitude values can differ by 0.1 degrees.
Right now MET prints a warning if the locations differ:
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (16.3, -60.4) does not match the track location (16.3, -60.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (16.6, -62.7) does not match the track location (16.6, -62.6)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (17.4, -66.9) does not match the track location (17.4, -66.8)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (17.8, -68.4) does not match the track location (17.8, -68.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (19.2, -70.4) does not match the track location (19.2, -70.3)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (20.2, -71.3) does not match the track location (20.2, -71.2)
WARNING: TrackPoint::add_diag_value() -> the diagnostic location (23.2, -72.9) does not match the track location (23.2, -72.8)
What sort of error checking do you recommend? Any idea why the longitudes might differ?
@JohnHalleyGotway - when you say these do not match, what do you mean? It looks like there is a match at 0 h:
This matches the t=0 CARQ lat/lon in the a-deck:
And the NHC official forecast track:
I think the lsdiag file lat/lons should match either the NHC official forecast track (in the case of the development dataset) or the underlying forecast track of the model it is being run on (e.g., the GFS deterministic forecast track for a GFS model diagnostics output, the HWRF forecast track for an HWRF model diagnostics output, etc.). But I'd like to get confirmation of that last point.
@JohnHalleyGotway - Alright, I see what you're talking about. Let me send a quick email to John Knaff, Mark DeMaria, and Robert DeMaria.
Additional development tasks defined while meeting 10/20/22:
[x] Add tcdiag_convert_map
and lsdiag_convert_map
config entries to the TC-Pairs config file.
[x] When the diag (lat, lon) and adeck (lat, lon) locations do not match a warning message is printed. But printing one for each diagnostic value is way too many. Refine logic to only print one warning per track point.
[x] For -tcdiag
and -lsdiag
command line options, support setting model to a comma-separated list of ATCFID's (model=ATCFID1,ATCFID2,ATCFID3
). Don't stop searching after the first match... keep searching.
-lsdiag path model=OFCL,SHIP,GFSO
writes TCDIAG lines for all 3 tracks rather than just one.[x] Running tc_stat
with -diag_thresh BADNAME gt0
finds and thresholds bad data instead. Revise that logic. If a track point DOES CONTAIN diagnostics, but the requested diagnostic name is not present, decide what do to. Error out? Or just skip applying that threshold?
-diag_thresh
or -init_diag_thresh
that is not actually defined for at least one TCDIAG line. Keep a running list of the warnings printed and don't warn about the same diagnostic name multiple times.
WARNING: TCStatJob::get_diag_double() -> diagnostic "MAXWIND" not defined for TrackPoint.
[x] Refine logic to handle 30 lsdiag rows mentioned above.
static const char lsdiag_skip_str[] = "MTPW,IR00,IRM1,IRM3,PC00,PCM1,PCM3,PSLV,IRXX";
[x] Refine tests in unit_tc_pairs.xml and unit_tc_stat.xml.
-tcdiag
and -lsdiag
in the same call to tc_pairs
. And tc_stat
processes that single output file.tc_stat
jobs are pretty slow for such little input? Could profile the code to see what's taking so long.[ ] Determine the ATCFID(s) to which the lsdiag values should be attached (currently using SHIPS but that results in lots of warning messages.
[ ] Consider adding use_diag_track = TRUE/FALSE;
config file option. If true, use the diag lat/lon location to define the track rather than matching to an existing ADECK track. This idea requires more thought. Would this be a benefit or just cause confusion We could add logic to bool TrackInfoArray::add_diag_data(DiagFile &diag_file, const StringArray &diag_name)
for this.
Met on 10/25/22 and decided the following...
[x] John HG: Consolidate -tcdiag
and -lsdiag
down into a single -diag SOURCE ...
command line option and support SOURCE = TCDIAG, LSDIAG_DEV, LSDIAG_RT. This will make it easier to add support for additional formats in the future (e.g. ENCORE and @jvigh's NetCDF version).
-diag SOURCE PATH
for SOURCE = TCDIAG,LSDIAG_RT, LSDIAG_DEV. Although while LSDIAG_DEV is allowed, it produces a runtime error downstream stating that it's not supported yet.[x] John HG: Replace tcdiag_convert_map
and lsdiag_convert_map
configuration options with a single diag_convert_map
option to handle all input sources.
[x] John HG: Add SOURCE column to the TCDIAG output line type.
tc_pairs
writes it, tc_stat
reads it and can write it back out.[ ] John HG: Longer term, figure out how to efficiently process LSDIAG_DEV data files which have millions of lines for 20 years of data! Currently, tc_pairs parses/stores all the diag data and then tries to match it up to track data. That won't work! Instead, we'll need to loop over the tracks and search for matching LSDIAG_DEV data.
[ ] Jonathan: Coordinate with Mark to clarify the track which corresponds to the LAT/LON rows of data in the LSDIAG_DEV and LSDIAG_RT data sources.
Describe the Enhancement
This task is requested by Paul Kucera and will be done to support the HFIP evaluations.
Currently, the MET-TC tools only handle hurricane ATCF files. This task is to add support for parsing/using hurricane model diagnostic files. The diagnostic files are ASCII and can be produced for each model/storm/initialization time. The diagnostic file contains track information for each lead time, just like the ATCF files, but also include many derived quantities, like shear and upper level quantities averaged over the storm area.
Enhance TC-Pairs to read TC diagnostics data in two different file formats:
ACTION: After parsing this diagnostics data and finding a match for the model track currently being evaluated, need to decide how to write the diagnostics to the output.
Consider either tacking on the diagnostics data to the end of the existing TCMPR line type. Or consider writing a new TCDIAG line type containing only the diagnostics values. This decision should be driven by downstream needs of METdatadb and METviewer. What makes the most sense since the number and type of diagnostic values can change? We wouldn't have a fixed set of output columns. They would vary based on the user configuration of the TC-Diag tool.
Time Estimate
1 week
Sub-Issues
Consider breaking the enhancement down into sub-issues.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
2770043
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
feature_<Issue Number>_<Description>
feature <Issue Number> <Description>
This task is requested by Paul Kucera and will be done to support the HFIP evaluations.
Currently, the MET-TC tools only handle hurricane ATCF files. This task is to add support for parsing/using hurricane model diagnostic files. The diagnostic files are ASCII and can be produced for each model/storm/initialization time. The diagnostic file contains track information for each lead time, just like the ATCF files, but also include many derived quantities, like shear and upper level quantities averaged over the storm area.
Define new classes in the vx_tc_util library for reading this diagnostic data. Suggest including the ability to write the diagnostic data to a file. This could be a new line type, like TCMPR but maybe TCDIAG, for tropical cyclone diagnostics. The header columns would need to be the same as those in TCMPR.
The wrinkle here is that the contents of the diagnostic file can vary based on how the diagnostics were generated. So we wouldn't have a fixed set of output columns. Instead, after the header columns, the column names would indicate the contents.
(1) Suggest writing a new tool that just reads in a bunch of diagnostic files and writes them out to a single file. This would be a reformatting step and make it easier to process the diagnostic data.
(2) Suggest enhancing the tc_pairs tool to also read in the diagnostic files. For each ATCF track, look for a matching diagnostic track. Look in the tc_pairs config file at the user-specified list of diagnostic quantities to be included. Then tack those diagnostic quantities to the end of the line. Include support for short-cuts like "ALL" and partial string matching, like "p500" for 500mb quantities.
(3) Suggest enhancing the tc_stat tool to read the TCDIAG line type and filter the lines. Also enhance tc_stat to read the additional variable columns from the TC_MPR lines.
[MET-392] created by johnhg