dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
77 stars 24 forks source link

Enhance Series-Analysis to read its own output and incrementally update output statistics over time #1371

Closed JohnHalleyGotway closed 1 month ago

JohnHalleyGotway commented 4 years ago

Describe the New Feature

This is a feature that was requested by the UK Met Office via met-help: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=95578

They would like to be able to create gridded statistics over a longer time period that they can hold their model output and analyses on disk. To enable this, we'd need to enhance Series-Analysis to read it own output to aggregate stats over a longer time period.

After discussing details with the MetOffice on July 24, 2024, we decided to handle this as described below:

  1. Update Series-Analysis to make it easier to configure to write "ALL" of the CTC, PCT, SL1L2, and SAL1L2 columns.
  2. Add support for a new (-aggr) command line option to provide output from a previous run of Series-Analysis.
  3. When the -aggr option is provided, read the previously generated counts and partial sums. Prior to computing the output statistics, aggregate the previously generated counts and partial sums with the newly generated ones. Compute the output statistics from those aggregated values.

Some details...

Acceptance Testing

List input data types and sources. Describe tests required for new functionality.

Time Estimate

3 days?

Sub-Issues

Consider breaking the new feature down into sub-issues. None needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Split between MetOffice (2799991) and NOAA (2792543) and 2783544

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 4 years ago

John Wagner, via met-help, indicated that this feature would also be useful for NOAA/MDL in their use of Series-Analysis: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=95583

JohnHalleyGotway commented 4 months ago

This is MetOffice deliverable due November 2024.

JohnHalleyGotway commented 2 months ago

This issue was discussed during the Met Office NGVER meeting on July 24, 2024.

The functionality needed here is similar to how the Gen-Vx-Mask tool works. When gen_vx_mask is given its own output as input, it initializes values using the previously defined mask.

For Series-Analysis, the logic needed is described below:

JohnHalleyGotway commented 2 months ago

Working on feature_1371_series_analysis branch. Added -input command line argument to define the output from previous Series-Analysis runs. Also added support for "ALL" being specified for the CTC, MCTC, PCT, and SL1L2 line types.

Still need to work on reading data from the -input file to aggregate prior results with the current data.

JohnHalleyGotway commented 2 months ago

@KathrynNewman advised that the -input command line argument name is confusing. Will switch to using -aggregate instead to be consistent with the Stat-Analysis -job aggregate terminology.

JohnHalleyGotway commented 1 month ago

TODO: MET #1371

Series-Analysis is definitely a tool that would benefit from being parallelized. While I'm focussing on the enhancements described in this issue, we should use a separate issue/feature branch to optimize it.