dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Enhance TC-Gen to verify NHC tropical weather outlook shapefiles. Refine logic to prevent rounding shapefile points to the nearest grid point. #1810

Closed JohnHalleyGotway closed 2 years ago

JohnHalleyGotway commented 3 years ago

Describe the New Feature

tc_gen_probabilistic_algorithm_v2.pdf

Note: During development, an issue with the handling of the (lat, lon) shapefile points was discovered. When applying them to a grid, after converting from (lat, lon) to grid (x, y), the (x, y) points were rounded to the nearest grid point. There is no specific need to do this rounding and it has been removed in this feature branch. This results in changes to the output of shapefile masking in gen_vx_mask. Skipping the rounding step produces a more accurate result.

Please see the attached slides to illustrate 2 main changes that are required for TC-Genesis verification. This issue describes the second of those 2 enhancements. Enhance MET to verify the NHC tropical weather outlook files.

The logic for this verification is described in the attached PDF. The task is to read a custom ASCII file which summarizes probability forecasts through time associated with each disturbance. If the disturbance did eventually develop into a storm, then the corresponding BEST track id is listed for that storm. If not, a sequence of 9's replaces that BEST track id.

For each line, the columns indicate the probability of storm development within 48, 120, and 168 hours, although those timesteps are hard-coded and NOT actually included in the metadata anywhere. Note that earlier versions of these files only had columns for 48 and 120 hours. So the tools should support 1, 2, or 3 numeric columns of probabilities, followed by a column for forecaster initials. The result should be probabilistic contingency table counts and statistics.

@halperin-erau has provided some sample data containing these tropical weather outlook summary files, which are only used internally within NHC. There are 2 reasonable implementation options... enhance tc_gen to process these files via a new command line option, or create an entirely new tool for this novel data format.

The advantage to the former is reusing many config options that would be needed. The advantage to the latter is avoiding confusion by users outside of NHC would won't have access to data in this format anyway.

However, as of May 2021, their format is still under development. In the existing and historical versions of these files, both the lat,lon location and valid timestamps are absent. If any of those columns are missing from the input data, tc_gen should print a warning message and ignore that input.

Be sure to subset output by basin, time window, and perhaps forecaster initials.

Acceptance Testing

List input data types and sources. Describe tests required for new functionality.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

Relevant Deadlines

The project for 7790901 technically ends August, 2021. However, @halperin-erau plans to request a 6-month no-cost extension.

Funding Source

7790901

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 3 years ago

From @halperin-erau on 6/7/21:

I contacted NHC about verifying the GIS shapefiles or the text files for the TWO probabilistic genesis forecasts. They were interested in verifying the TWO genesis forecasts from the GIS shapefiles. It sounds like that would be a new capability for them to use if we add it to TC-Gen.

So, I propose that we verify the GIS shapefiles using the following logic: Read each "areas" shape/layer in the shapefile. For each area: Read the forecast genesis probability information. Determine in which basin the area occurs. Use the BEST/b-decks to determine whether the BEST genesis point (user-defined as first "TD", "TS", etc.) is within the TWO area AND whether the BEST genesis time is within X hours of the TWO issuance time. If yes, forecast is a HIT. If no, forecast is a FALSE ALARM. John, I think you already downloaded a .zip file from NHC with the shapefile information. Additional sample data are available in the TWO archive:

https://www.nhc.noaa.gov/archive/xgtwo/gtwo_archive_list.php?basin=atl

I'm happy to have another telecon to discuss if it would be helpful. I indicated to NHC that if we verify the shapefiles, we may drop support for the TWO text files (i.e., component #2 in the PowerPoint we went over during the last telecon).

JohnHalleyGotway commented 2 years ago

Here's a screenshot from: https://www.nhc.noaa.gov/archive/xgtwo/gtwo_archive.php?current_issuance=202109030233&basin=atlc&fdays=5

Screen Shot 2021-11-22 at 9 38 03 AM

The corresponding shapefiles found in: https://www.nhc.noaa.gov/archive/xgtwo/atl/202109030233/gtwo_shapefiles.zip Contain:

JohnHalleyGotway commented 2 years ago

Per @halperin-erau, if only 1 probability is listed, assume its for 2 days. The second probability is for 5 days, and third is for 7 days.

JohnHalleyGotway commented 2 years ago

From @halperin-erau: I was thinking more about the verification logic for the gtwo_area*.shp files, and I think it will be simpler than the logic we used for the DEV/OPS methods.

Unlike the deterministic forecasts and the probabilistic forecasts in e-deck format, the gtwo_area*.shp files do not have a specific forecast genesis point location or valid time. Therefore, we cannot use the same matching logic that we employed for the DEV/OPS methods. Instead, I suggest the following:

unnamed

Let me know if you have any questions or concerns about this logic.

JohnHalleyGotway commented 2 years ago

@halperin-erau I did some more digging and think that my logic is not quite yet sufficient. But I'd like you to confirm.

I have 2 questions.

(1) Should I add logic to get rid of apparent "duplicates"?

Running with all NHC 2021 shapefiles from the atl, cpac, and epac basins, tc_gen processes a total of 3913 shapes. However, I believe that only 1164 of them are unique. By way of example, here's some log output showing the following shape appearing in 5 different files (see below for more details):

Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.

I don't know what triggers NHC to publish (and re-publish) these shapes, but the data suggests that all shapes ACTIVE at that time are included. So I recommend that I add logic to avoid these duplicates, making sure not to score them more than once.

(2) The current logic rounds the hours and minutes UP to the next hour and interprets that as the "issue" time. So when looking for a BEST track genesis match, we round up to the next hour and then look for genesis events within 48 and 120 hours of that time. Is this good logic or should I be using the actual hours and minutes listed without any rounding?

Thanks, John

FYI: Here's the complete log showing the same shape in 5 files (2 in atl, 2 in epac, and 1 in cpac). Note that 4 of them have the same rounded issue time of 20211111_000000 and the last one is for 6 hours later (20211111_060000):

DEBUG 4: [File 752 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/atl/202111102341/gtwo_areas_202111102340.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 753 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/atl/202111102353/gtwo_areas_202111102353.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 1477 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/cpac/202111110018/gtwo_areas_202111102353.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 2291 of 2372]: Found 1 records with issue time 20211111_000000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/epac/202111110430/gtwo_areas_202111102353.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.
--
--
DEBUG 4: [File 2292 of 2372]: Found 1 records with issue time 20211111_060000 in file "/Volumes/d1/projects/MET/MET_unit_test/MET_test_input/tc_data/genesis/shapes/epac/202111110512/gtwo_areas_202111110511.shp".
DEBUG 5:   Atlantic basin shape 1 has 129 points with latitudes from 35.5587 to 45.7149 and longitudes from -54.7273 to -38.81.
DEBUG 5:     50% probability of 48-hour genesis has NO MATCH in the BEST track.
DEBUG 5:     50% probability of 120-hour genesis has NO MATCH in the BEST track.