New Use Case: construct use case verifying GFS cloud forecasts vs. GOES-16 cloud products

DanielAdriaansen commented 1 month ago

Describe the New Use Case

This use case will demonstrate verifying forecasts of cloud information on the GFS global 0.25 degree grid, using GOES-16 level 2 cloud products for truth over the GOES-16 domain using GridStat.

The fields to verify will be cloud fields we identify in GFS output such as:

Cloud Top Height

We will need to search for and identify the proper field names and levels in the GFS files.

The measures of skill that should be included are:

Gilbert Skill Score
Equitable Threat Score
Fraction Skill Score
False Alarm Rate
Hit Rate
Bias

The end goal is for the user to be able to substitute the GFS forecasts with a separate GFS-based AI/ML cloud forecast product on the same GFS 0.25 degree grid, to compare with the GOES-16 products. This framework is to support them to be able to do this. We may get some sample data of their truth, however due to restrictions on releasing the data we may need to leave the GFS forecast data in place.

The user would also like to be able to stratify forecast performance based on categories of cloud types. These cloud types will be provided later on, but we should brainstorm another type of stratification we can perform using an external classification (maybe weather regimes? precipitation type?), or, implement some simple post-processing, for example stratify performance by all clouds e.g. >= 8 km ("high clouds").

Checklist to get working:

[ ] Identify source of GFS global forecasts and analyses
[ ] Identify available cloud variables in GFS forecasts
[ ] Identify source of GOES-16 L2 products
[ ] Review existing cloud use cases for overlap and examples we may model from
[ ] Begin configuration file development
[ ] Begin documentation
[ ] Begin testing GridStat w/METplus Wrappers
[ ] Collaborate with #2743, #2745

Use Case Name and Category

`model_applications/clouds/GridStat_fcstGFS_obsGOES16_cloudTop

Input Data

FCST: GFS 0.25 degree forecasts and analyses. OBS: GOES-16 Level 2 cloud top height

Acceptance Testing

Describe tests required for new functionality. As use case develops, provide a run time here

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

[ ] Add a checkbox for each sub-issue here.

Relevant Deadlines

Must be completed by 12/31/2024

Funding Source

7730022

Define the Metadata

Assignee

[ ] Select engineer(s) or no engineer required
[ ] Select scientist(s) or no scientist required

Labels

[ ] Review default alert labels
[ ] Select component(s)
[ ] Select priority
[ ] Select requestor(s)
[ ] Select privacy

Milestone and Projects

[ ] Select Milestone as a METplus-Wrappers-X.Y.Z version, Consider for Next Release, or Backlog of Development Ideas
[ ] For a METplus-Wrappers-X.Y.Z version, select the METplus-Wrappers-X.Y.Z Development project

Define Related Issue(s)

Consider the impact to the other METplus components.

[ ] METplus, MET, METdataio, METviewer, METexpress, METcalcpy, METplotpy

New Use Case Checklist

See the METplus Workflow for details.

[ ] Complete the issue definition above, including the Time Estimate and Funding source.
[ ] Fork this repository or create a branch of develop. Branch name: feature_<Issue Number>_<Description>
[ ] Complete the development and test your changes.
[ ] Add/update log messages for easier debugging.
[ ] Add/update unit tests.
[ ] Add/update documentation.
[ ] Add any new Python packages to the METplus Components Python Requirements table.
[ ] For any new datasets, an entry to the METplus Verification Datasets Guide.
[ ] Push local changes to GitHub.
[ ] Submit a pull request to merge into develop. Pull request: feature <Issue Number> <Description>
[ ] Define the pull request metadata, as permissions allow. Select: Reviewer(s) and Development issue Select: Milestone as the next official version Select: METplus-Wrappers-X.Y.Z Development project for development toward the next official release
[ ] Iterate until the reviewer(s) accept your changes. Merge branch into develop.
[ ] Create a second pull request to merge develop into develop-ref, following the same steps for the first pull request.
[ ] Delete your fork or branch.
[ ] Close this issue.

DanielAdriaansen commented 3 weeks ago

GFS data are here: /d1/projects/METplus/METplus_Data/development/nrl/cloud/GFS_0.25

Using this command:

wgrib2 -v gfs.0p25.2024030700.f012.grib2 | grep cloud

I see relevant variables:

634:453577593:d=2024030700:HCDC High Cloud Cover [%]:high cloud layer:12 hour fcst:
635:454309341:d=2024030700:HCDC High Cloud Cover [%]:high cloud layer:6-12 hour ave fcst:
643:462935144:d=2024030700:PRES Pressure [Pa]:convective cloud top level:12 hour fcst:
646:466180239:d=2024030700:PRES Pressure [Pa]:high cloud top level:6-12 hour ave fcst:
649:469628288:d=2024030700:TMP Temperature [K]:high cloud top level:6-12 hour ave fcst:

GOES satellite cloud top height products are here:

/d1/projects/METplus/METplus_Data/development/nrl/cloud/noaa-goes16
/d1/projects/METplus/METplus_Data/development/nrl/cloud/noaa-goes18

j-opatz commented 3 weeks ago

@briannen I think this might be a good use case to start for you.

Here's a link to an existing use case that utilizes GFS forecasts of cloud fractions at two heights (low and high): https://metplus.readthedocs.io/en/develop/generated/model_applications/clouds/GridStat_fcstGFS_obsERA5_lowAndTotalCloudFrac.html#sphx-glr-generated-model-applications-clouds-gridstat-fcstgfs-obsera5-lowandtotalcloudfrac-py

Additionally, here a link to another use case using GFS forecasts, but this time for cloud pressure and temperature heights: https://metplus.readthedocs.io/en/develop/generated/model_applications/clouds/GridStat_fcstGFS_obsSATCORPS_cloudTopPressAndTemp.html#sphx-glr-generated-model-applications-clouds-gridstat-fcstgfs-obssatcorps-cloudtoppressandtemp-py

Can you take a look at the two of these and come up with a "best variable field" to use? The cloud products Dan listed in his last comment should help guide which field to use. We should also consider keeping some (probably not all) of the accompanying thresholds; they've been used by the navy in the past and appear to be of interest.

For now, I think we want to focus on GridStat applications. If you think we should push into MODE applications (especially if we pursue cloud fractions), we'll pull in @CPKalb for additional guidance.

briannen commented 2 weeks ago

@j-opatz I've looked at the GOES data files, and I'm not finding anything other than cloud top height. There's also a variable called cloud_pixels which is the number of cloudy or probably cloudy pixels. Link to nc file header

It would be nice to see if we could do cloud fraction because we have all these different levels

630:449397943:d=2024030700:LCDC:low cloud layer:6 hour fcst:
631:450219107:d=2024030700:LCDC:low cloud layer:0-6 hour ave fcst:
632:451121655:d=2024030700:MCDC:middle cloud layer:6 hour fcst:
633:451732741:d=2024030700:MCDC:middle cloud layer:0-6 hour ave fcst:
634:452443872:d=2024030700:HCDC:high cloud layer:6 hour fcst:
635:453188066:d=2024030700:HCDC:high cloud layer:0-6 hour ave fcst:
638:455885075:d=2024030700:HGT:cloud ceiling:6 hour fcst:
639:457076434:d=2024030700:PRES:convective cloud bottom level:6 hour fcst:
640:457615219:d=2024030700:PRES:low cloud bottom level:0-6 hour ave fcst:
641:459029138:d=2024030700:PRES:middle cloud bottom level:0-6 hour ave fcst:
642:460243738:d=2024030700:PRES:high cloud bottom level:0-6 hour ave fcst:
643:461769213:d=2024030700:PRES:convective cloud top level:6 hour fcst:
644:462358939:d=2024030700:PRES:low cloud top level:0-6 hour ave fcst:
645:463801361:d=2024030700:PRES:middle cloud top level:0-6 hour ave fcst:
646:464981957:d=2024030700:PRES:high cloud top level:0-6 hour ave fcst:
647:466513941:d=2024030700:TMP:low cloud top level:0-6 hour ave fcst:
648:467521876:d=2024030700:TMP:middle cloud top level:0-6 hour ave fcst:
649:468418765:d=2024030700:TMP:high cloud top level:0-6 hour ave fcst:
650:469629089:d=2024030700:TCDC:convective cloud layer:6 hour fcst:
651:470338169:d=2024030700:TCDC:boundary layer cloud layer:0-6 hour ave fcst:

I also know that when I worked with NRL before, they were always super interested in cloud fraction.

j-opatz commented 2 weeks ago

Thanks for looking, @briannen.

Have you viewed what the cloud_pixels field looks like? Is it just a count of the cloudy/probably cloudy pixels, or does it actually categorize the various pixels of cloudy/not-cloudy? If it's the latter, we could (theoretically) get that data onto a grid, with each gridpoint then having a cloudy/not cloudy value and have a rough estimate of cloud fraction for comparison. Thoughts? I do agree that NRL enjoys a good cloud fraction field.

If we don't have that luxury, it seems like this use case will need to compare the heights. So a workflow would be point2grid (for GOES) -> GridStat (for heights).

I've secured a copy of Ho-Chun's Point2Grid config file, which demonstrates some of the settings he used to analyze GOES imagery. I've placed it in /d1/projects/METplus/METplus_Data/development/nrl/cloud/Point2Grid on Seneca. There's also a log file since many of the settings are accomplished via shell scripting: to get to where he's testing, do a search for the string VDATE=20241105. That should have most of the env variables in question.

briannen commented 2 weeks ago

@j-opatz It looks like the cloud_pixels variable is just the number of cloudy pixels in the image. When I do an ncdump on a file, I get this:

cloud_pixels = 675287 ;

I think the goes files that we have are just for cloud top height per the metadata:

:title = "ABI L2 Cloud Top Height" ;

I'm not sure which variable in the GFS data matches up to cloud top height, maybe @DanielAdriaansen has a better idea? I also wonder if we could get cloud fraction data from GOES pretty easily.

j-opatz commented 2 weeks ago

Thanks for the check, @briannen. It's a shame that we don't have more information re: clouds, but I suspect you're right; the file we have will only have the cloud top height, and other files out there may have more detailed cloud information from GOES.

However, given the timeframe to turn this around into a viable use case (with more use cases potentially in the pipeline), let's get started on a use case of GFS heights to GOES-EAST and GOES-WEST heights. It seems like message number 638 will be the one from the GFS grib2 file to use:

638:455885075:d=2024030700:HGT:cloud ceiling:6 hour fcst:

Tina and I will work on cloud fractions for our GFS to GFS comparison so that field verification is exhibited somewhere.

Can you please start by taking a look at the config file I've added to Seneca that details the point2grid conversion for GOES data? Once you can successfully get a GOES file to be read in via METplus wrappers (a quick Plot-Data-Plane of the resulting netCDF and comparing with an ncview of the original GOES file should be sufficient), let me know and you and I can work to set up some height thresholds in a GridStat call, along with which statistics are best to request.

One thing we'll have to get some direction on is if we want just EAST, WEST, or both. It would require a a second run for both tools, but wouldn't be much more complex to configure.

DanielAdriaansen commented 2 weeks ago

@briannen @j-opatz just one note- the GFS message here:

638:455885075:d=2024030700:HGT:cloud ceiling:6 hour fcst:

Is for cloud base height, but the GOES satellite gives cloud top height. I am working with @georgemccabe to obtain cloud top pressure which would match GFS here:

646:466180239:d=2024030700:PRES Pressure [Pa]:high cloud top level:6-12 hour ave fcst:

and also cloud top temperature which the GFS also appears to have:

649:469628288:d=2024030700:TMP Temperature [K]:high cloud top level:6-12 hour ave fcst:

I think @j-opatz instructions for getting point2grid working still apply though, and should be the work right now while I obtain other GOES data. Then we should be able to just swap out the file and variable name.

j-opatz commented 2 weeks ago

Thanks for the correction @DanielAdriaansen, my mind slipped for a minute there when looking at the HGT field and thinking it was similar to the PRES fields that have bottom and top pressures represented.

With this new direction for field verification, should this use case seek to validate a selected cloud height top pressure and a selected cloud height top temperature? It's also important to note that the GFS values are averages for the 0-6 hour forecast window, while GOES values will be instantaneous. Probably not as bad a comparison as, say, a 0-6 hour average temperature, but something to be aware of.

DanielAdriaansen commented 2 weeks ago

No problem.

The cloud top pressure GOES data are here on Seneca ABI-L2-CTPC

In the corresponding goes-16 and goes-18 directories that the cloud top height files were in.

I will reply when we have cloud top temperature data.

Regarding values, good question. I think pressure and temperature can still be related to height so maybe we look at a couple thresholds? Good point about the 0-6hr mean. I wonder if we want to do averaging of the GOES data then? But I think this is more about the infrastructure and workflow right now, so let's just use the single GOES time.

Let me know if we should get together and brainstorm thresholds and levels.

DanielAdriaansen commented 1 week ago

The cloud top temperature data are here on Seneca ABI-L2-ACHTF

In the corresponding goes-16 and goes-18 directories that the cloud top height files were in.

Thanks @georgemccabe for gathering the data!

j-opatz commented 1 week ago

Thanks @DanielAdriaansen and @georgemccabe for getting a hold of this data! We should be ready to dive into the configuration file testing and creation phase.

To establish the pressure and temperature thresholding, it might be best to do a little exploratory work: select 1 GFS cloud level (low-mid-high) for one of the variable fields of interest, and record the range. Rinse and repeat until we have a range for each cloud level for both variables. Then select 4-5 thresholds for a single cloud type of a given variable field and run it against the GOES data. If it seems like reasonable output, we can go that route.

@briannen can you take a running start at this? Start with high clouds (greater error in GFS forecasting ability that may be of interest, statistically speaking) and look at the pressure and temperature ranges of those forecasts. Hopefully they lend themselves to an easy division for testing against the GOES pressure and temperature observations. Don't forget the catch alls of >0 and <[reasonable TMP or PRES]! Once you have some thresholds for those ranges, try running them against GOES-EAST and see what you think of the statistical results and tweak them as needed. You and I can review the results when they're ready.

I'll add that we do have a bit of a hint of what they might like to see re: thresholding. Take a look at this use case, where high top cloud pressures and temperatures were being verified with NRL direction. While the values are more standardized (0, 180, 190, 200, etc. for TMP Kelvin and 0, 10, 20, 30, etc. for mb), we may not need to be too exact in our thresholding approach, other than knowing GFS low-mid-high cloud definitions (which I think are <6,500ft, 6,500ft - ~20,000ft, >20,000 ft) and planning accordingly for whatever cloud height we end up looking at.

A final note: we could complete 6 different variable field verifications: 3 different GFS cloud height definitions with each height having a pressure and temperature field. Not overly complicated, but 1 GFS cloud height with a pressure and temperature field verification would accomplish nearly the same thing. The only thing lost would be the other cloud levels thresholds. @DanielAdriaansen feel free to make the final call on if we should stick to 1 cloud height level, or go with all 3 cloud heights.

briannen commented 1 week ago

@j-opatz @DanielAdriaansen See the following examples from plot_data_plane for both PRES and TEMP after running Point2Grid. They both look good to me.

goes16_pres.pdf goes16_temp.pdf

Currently, I have the entire file names hard-coded in the conf files because I'm not sure how the times work in the filenames. There are several files in each directory, so will we also need to aggregate them together? For example:

OR_ABI-L2-ACHTF-M6_G16_s20240670600205_e20240670609513_c20240670613081.nc
OR_ABI-L2-ACHTF-M6_G16_s20240670630205_e20240670639513_c20240670642565.nc
OR_ABI-L2-ACHTF-M6_G16_s20240670610205_e20240670619513_c20240670624499.nc
OR_ABI-L2-ACHTF-M6_G16_s20240670640205_e20240670649513_c20240670652525.nc
OR_ABI-L2-ACHTF-M6_G16_s20240670620205_e20240670629513_c20240670633007.nc
OR_ABI-L2-ACHTF-M6_G16_s20240670650205_e20240670659513_c20240670702592.nc

j-opatz commented 1 week ago

Good question about the file names, @briannen. I was able to do a little digging and found this webpage. At the very bottom, it discusses the naming scheme. Read it for more details, but I'll copy the relevant timing parts here:

s20171671145342: is start of scan time 4 digit year 3 digit day of year 2 digit hour 2 digit minute 2 digit second 1 digit tenth of second

e20171671156109: is end of scan time c20171671156144: is netCDF4 file creation time

Thanks for taking a look at the GOES data and getting point2grid to work! Data fields look as expected; figuring that the higher temps and pressure values are the surface, it seems we have a bit of leeway to find thresholds that work. Did you try to set any thresholds for the GFS data from what you saw in the GOES file(s)? For example, from the PDF you provided I might say that >0, <210, >220, >230, >240, >250, >260, >270, 280 would be appropriate for CTC, CTS, CNT line types.

I think for this use case, we'll only want to use point2grid (for conversion) and GridStat, so aggregation shouldn't be necessary. Given the GOES-16 files are all for 6:00 to 6:50 we want to make sure the times closest to the GFS valid times (6z, 12z, and 18z) are used. In this case, we should stick with the first file from your list, which shows a start time of 6:00Z and end of 6:09Z.

If there are additional GOES files for the 12Z and 18Z, use a similar closest time methodology to identify the correct observation file.

dtcenter / METplus