dtcenter / METplus

Python scripting infrastructure for MET tools.
https://metplus.readthedocs.io
Apache License 2.0
94 stars 37 forks source link

Enhancement: Add option to check for missing input and skip wrapper runs if threshold is not met #2524

Open georgemccabe opened 4 months ago

georgemccabe commented 4 months ago

2460 was completed to add a check for missing inputs and configure a threshold to prevent errors if some inputs were not available. Shannon Shields from NOAA/EMC tested and noted that to match the behavior currently implemented for EVS, the check for missing inputs and comparison to configurable threshold should be done before running any wrappers and all calls to wrappers should not be executed if the threshold is not met.

Describe the Enhancement

Modify logic to loop through all times and gather report of available inputs to see if any wrappers should be called. Consider only doing this if METplus is configured to perform the missing input check.

More thought is needed to determine how to implement this change. For use cases with multiple items in the PROCESS_LIST, it may not be clear which inputs to a wrapper are actual inputs to the use case or output from another wrapper in the PROCESS_LIST. Perhaps a separate wrapper could be added where users can define the filename templates of the inputs to the use case to check for availability. This could allow different thresholds of missing/available input for each input dataset.

Time Estimate

~3 days

Sub-Issues

Consider breaking the enhancement down into sub-issues.

Relevant Deadlines

EVS 2.0 if possible, but scripting is in already place to handle this before the call to METplus wrappers.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

AliciaBentley-NOAA commented 2 months ago

@JohnHalleyGotway @georgemccabe I'm following up with Shannon/others to see if this is critical for EVS v2.0 or not. I will let you know if it is. Thanks!

georgemccabe commented 1 month ago

We discussed the details of this issue in the METplus/NOAA Telecon on 6/10/2024. It was reiterated that these changes are not critical to be added for the next release to include in EVS v2.0. Shannon has scripting logic to handle this situation.

An InputCheck wrapper would satisfy this requirement. The wrapper would support the following configuration variable types:

Note that we need to consider how to handle multiple ensembles. Per Shannon, all members must be present for each init/lead time in order for the runtime to be counted as valid. Need to flesh out these details about ensemble member handling more.

George suggested using a comma-separated list in the _TEMPLATE variable to list multiple ensembles that correspond to the same time. We would like to be able to easily configure this to process multiple ensembles whose names may be differentiated by a string (rather than a number) that could be tedious to list out. Maybe the CUSTOM_LOOP_LIST behavior would help here?

Example:

[config]

PROCESS_LIST = InputCheck, InputCheck(abc), InputCheck(xyz), PB2NC, PointStat

INPUT_CHECK_RUNTIME_FREQ = RUN_ONCE_PER_INIT_OR_VALID

INPUT_CHECK_TEMPLATE = /my/dir/to/pb/pb_{valid?fmt=%Y%m%d_%H}.nc
INPUT_CHECK_THRESH = 0.9
INPUT_CHECK_FAIL_LOG_LEVEL = WARN
INPUT_CHECK_FAIL_ACTION = CONTINUE

[abc]

INPUT_CHECK_RUNTIME_FREQ = RUN_ONCE_FOR_EACH

INPUT_CHECK_DIR = /my/dir/to/abc
INPUT_CHECK_TEMPLATE = abc_{valid?fmt=%Y%m%d_%H}.abc
INPUT_CHECK_THRESH = 1.0
INPUT_CHECK_FAIL_LOG_LEVEL = ERROR
INPUT_CHECK_FAIL_ACTION = EXIT

[xyz]

INPUT_CHECK_RUNTIME_FREQ = RUN_ONCE_FOR_EACH
INPUT_CHECK_TEMPLATE = /my/dir/to/xyz/xyz_{valid?fmt=%Y%m%d_%H}.xyz
INPUT_CHECK_THRESH = 0.5
INPUT_CHECK_FAIL_LOG_LEVEL = INFO
INPUT_CHECK_FAIL_ACTION = EXIT

[config]

…

In this example, there are 3 input checks -- config (no instance ID), abc, and xyz.

1st check: Checks for files matching /my/dir/to/pb/pb{valid?fmt=%Y%m%d%H}.nc for each INIT or VALID time (depending on LOOP_BY). If less than 90% of run times have all input files needed, output a WARNING message and continue running the subsequent items in the PROCESS_LIST.

2nd check: Checks for files matching /my/dir/to/abc/abc{valid?fmt=%Y%m%d%H}.abc for each run time (init/valid and forecast leads). If any of run times do not have all input files needed (<100%), output an ERROR message and exit without running the subsequent items in the PROCESS_LIST.

3rd check: Checks for files matching /my/dir/to/xyz/xyz{valid?fmt=%Y%m%d%H}.xyz for each run time (init/valid and forecast leads). If less than 50% of run times have all input files needed, output an INFO message and exit without running the subsequent items in the PROCESS_LIST.