Closed JohnHalleyGotway closed 2 weeks ago
I was torn as to whether we should introduce a single new fcst_climo_mean
config entry… or just use the existing climo_mean
and climo_stdev
ones and change the context from which we parse them. Right now they’re both parsed from the top-level config context but we could switch to parsing them from inside the fcst
and obs
dictionaries instead… very similar to what we do when parsing fcst.field
and obs.field
for example.
Since @GwenChen-NOAA confirmed via email on June 24, 2024 that she’ll want both a forecast climo_mean and climo_stdev, I think we should just do that latter. Although it isn’t immediately clear to me if/how we’d actually use the forecast climo_stdev data yet. But hopefully @GwenChen-NOAA and @j-opatz will figure that out.
On July 5, 2024, @JohnHalleyGotway met with @TaraJensen to clarify the desired extent of these changes. @TaraJensen confirmed that we want to make the support for forecast climo data as thorough and complete as our existing support for "obs" climo data.
So that includes adding new MPR and ORANK mean/stdev/cdf columns for this data, and also adding support for "FCDP" and "OCDP" threshold type for forecast/observation climo distribution percentile thresholds. Note that use of existing "CDP" threshold types should still be supported for backward compatibility but should be interpreted as being set as "OCDP" thresholds.
I added backward compatibility so that if old column names are requested, values for the new column names will be returned. But a message will be logged directing users to switch to the new names. Here's a Stat-Analysis example:
stat_analysis -lookin mpr.txt -job summary -line_type MPR -column CLIMO_MEAN,CLIMO_STDEV,CLIMO_CDF
...
DEBUG 2: The "CLIMO_MEAN" column in the MPR line type has been renamed as "OBS_CLIMO_MEAN". Please switch to using METV12.0.0 column names.
DEBUG 2: The "CLIMO_STDEV" column in the MPR line type has been renamed as "OBS_CLIMO_STDEV". Please switch to using METV12.0.0 column names.
DEBUG 2: The "CLIMO_CDF" column in the MPR line type has been renamed as "OBS_CLIMO_CDF". Please switch to using METV12.0.0 column names.
Will this mean that I should update the write_mpr program as part of METcalcpy to use the new format for MET 12?
@CPKalb, yes, thanks for mentioning that. I updated the checklist in the body of the issue to note that we'll need a new METplus use case (ideally), new METplus config options, and to update METplus-Analysis to handle the new columns being added to MPR and ORANK lines for MET v12.0.0.
However, those columns are still somewhat in flux. In fact, yesterday, we decided NOT to write out FCST_CLIMO_CDF
. So once I'm 100% confident on the changes, I'll write up METplus-Analysis issues.
As discussed with @j-opatz on Slack, allow for SCP and CDP to be used in the input and interpreted as being SOCP and OCDP percentile threshold types.
However, print a single debug message about recommending they switch to SOCP and OCDP instead.
And DO NOT write the old SCP and CDP threshold strings to the output. Only use the new ones.
Per @j-opatz, recommend that we include a snapshot that shows how a user might set up the fcst and obs dictionaries to access the new features. It wouldn't need to be reproducible by users (or really even runable for us internally) but I think the visualization might help users understand the new descriptions.
This could be added in the 2nd PR for this issue.
@j-opatz, I have a science question about the SL1L2 and SAL1L2 line types. Currently the SL1L2 MAE column and SAL1L2 MAE column report the exact same number.
Seems like the SL1L2 MAE should be mean(abs(f-o))
while the SAL1L2 MAE should be mean(abs((f-c)-(o-c)))
… although thinking about it, I see that mean(abs((f-c)-(o-c))) = mean(abs(f-o))
since -c - (-c) = 0
.
But with the changes for MET #2924, that's no longer the case, where the forecast climo mean != obs climo mean. So it seems to me we should keep track of separate SL1L2 MAE and SAL1L2 MAE scores now, where the latter is computed as:
mean(abs((f-fc)-(o-oc)))
since the fc
forecast climo means and oc
observation climo means are not necessarily the same value.
Does this make sense to you? If so, I'll submit a 3rd PR for this issue with these changes.
And if so, should I rename the existing SAL1L2 column from MAE
to MAAE
for "Mean-Absolute-Anomalous-Error" to make the difference more abundantly clear? Or is it not worth the effort of breaking the METdataio schema for a 1-letter change?
Updating the computation of SAL1L2 MAE seems to be the most accurate option for including the new functionality of climatologies. I agree with this approach and a 3rd PR. Does the 2nd PR need to be approved before the 3rd can be approved, or can they be completed out of order?
I think I would avoid changing the column name and instead make reference to the updated calculation (relative anomalies for each dataset) in the Appendix C entry for SL1L2 and SAL1L2. I see that as it currently stands, neither entry actually provides an equation for MAE, opting for blanket statements of
Some of the other statistics for continuous forecasts (e.g., RMSE) can be derived from these moments
and
Computation of these statistics requires a climatological value, c.
During this review I also saw that the entry for Anomaly Correlation Coefficient has some language that should be clarified, given the new climatology source available to users. This language,
The anomaly is the difference between the individual forecast or observation and the typical situation, as measured by a climatology (c) of some variety
seems vague and may benefit from a tightened description that mentions the required climatology data will be taken from its respective field source, if available.
@georgemccabe (and tagging @j-opatz to loop him in), as we discussed on Aug 27, 2024, I did some additional testing for the parsing on climo settings from MET config files.
The question is whether the climo_mean
and climo_stdev
dictionaries in the config file could contain ONLY file_name
and field
, for example, but NOT regrid
, time_interp_method
, day_interval
, and hour_interval
. If not, we'd want to parse the default values defined in the top-level climo_mean
and climo_stdev
dictionaries.
My guess is that this will not work and we'll get a config parsing error due to the missing time_interp_method
, day_interval
, and hour_interval
settings.
I tested this as the met_test
user by running the seneca:/d1/projects/MET/MET_pull_requests/met-12.0.0/beta6/test_fcst_climo_config_parsing/unit_climatology_mixed.sh
script. This the test run by unit_climatology_mixed.xml
but I commented out the extra settings inside climo_mean
and climo_stdev
specified in the Grid-Stat config file. That produces this runtime error:
ERROR :
ERROR : Dictionary::lookup_int() -> lookup failed for name "time_interp_method"
ERROR :
So it is NOT getting those default settings from the default climo_mean
and climo_stdev
defined at the top-level config file context in the default configuration file.
Next, I reran after modifying the default Grid-Stat configuration file to move climo_mean
and climo_stdev
inside the fcst
and obs
dictionaries. And that DOES enable the same command to run without error.
@j-opatz how should we address this situation? I'd recommend that we just modify the default MET configuration files to move the climo_mean
and climo_stdev
dictionaries inside the fcst
and obs
dictionaries since that's the context from which those dictionaries are now being parsed. That's a simple solution that provides transparency and makes the logic of METplus smooth.
The alternative is devising more complex special handling logic in MET which is likely to be tricky and confusing.
As discussed via Slack on 8/28/24, the plan is to update the logic for parsing the climo_mean
and climo_stdev
configuration entries. This parsing logic is defined in vx_statistics/read_climo.cc
.
If we were to move these into the fcst
and obs
dictionaries in the DEFAULT config files, then every run of Grid-Stat (for example) will find it defined there and the user’s changes in the top-level settings wouldn't override those defaults as they previously did. That would cause confusion and frustration for many users for a feature we’re adding only for a few users.
Each of these dictionaries consists of 6 elements:
Right now, it’s all or nothing. They’re all parsed from the same config file context. And it errors out if one of them is missing. The desired change is to parse them individually rather than as a group. For each of the 6 items:
fcst.climo_mean.time_interp_method
(for example) exists, use it.climo_mean.time_interp_method
(for example) from the top-level context.Reopening this issue because a downstream METplus Use Case unexpectedly failed in this Testing Workflow Run.
In the Use Case Tests (s2s:4) job, series_analysis errored out with:
ERROR : check_climo_n_vx() -> The number of climatology mean fields in "climo_mean.field" must be zero or match the number (29) in "fcst.field".
This is caused by the change to series_analysis on this line:
// Check for consistent number of climatology fields
check_climo_n_vx(fdict, n_fcst);
check_climo_n_vx(odict, n_obs);
This use case provides 1 forecast field and 29 observation fields. Only one field is provided for the climo mean and standard deviation. Prior to MET#2924, we were only checking that climo and forecast field are consistent:
// Check climatology fields
check_climo_n_vx(&conf, n_fcst);
Need to update the logic and log messages in check_climo_n_vx()
. Supplying 0 fields, 1 field, or a list of fields matching number of corresponding input field, should all work.
Note that @georgemccabe recommend adding a unit test to cover this configuration wrinkle.
Describe the Enhancement
The
climo_mean
andclimo_stdev
configuration file options define climatology input files for the MET tools. As of MET version 11.1.0, they are parsed from the top-level configuration file context, meaning that they can only be defined once. So when computing anomaly statistics, the same climo value is subtracted from both the forecast and observation data.This issue is to enhance MET to support separate climatology inputs for the forecast and observation datasets, as discussed during the NOAA METplus User Telecon on June 24, 2024 and requested previously in this GitHub issue comment for MET#2308.
This work requires the following changes:
climo_mean
andclimo_stdev
config options separately within the forecast and observation dictionaries. Note that we'll need a METplus issue to update the wrappers accordingly to make this configurable via METplus. This impacts 5 MET tools: Point-Stat, Grid-Stat, Ensemble-Stat, Series-Analysis. Note that they are also present in Gen-Ens-Prod, but that tool does NOT process observation data.CLIMO_MEAN
,CLIMO_STDEV
, andCLIMO_CDF
(only in MPR). How should these change? Assume that we need extra columns to report the forecast climo values?Additional questions:
climo_stdev
data is used to define climatology-based thresholds (e.g. >CDP75) and to apply binned climatology logic. Do we need the flexibility to use the forecast climatology data in a similar way? Or do we really only need to define the forecast climatology mean so that it can be subtracted from forecast value when computing anomaly statistics?Stages of development:
Time Estimate
7 days?
Sub-Issues
Consider breaking the enhancement down into sub-issues.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
2700044
Define the Metadata
Assignee
Labels
Milestone and Projects
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
feature_<Issue Number>_<Description>
feature <Issue Number> <Description>