NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I would like to compare verification metrics across two or more modeling scenarios (e.g. from two separate models) #188

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: James (James) Original Redmine Issue: 51673, https://vlab.noaa.gov/redmine/issues/51673 Original Date: 2018-06-18


Expected behavior:

As a user, I would like to compare verification results from two or more modeling scenarios (whether forecasts or simulations) in the same context (outputs, visualizations). Here, a modeling scenario might include (among other things):

A modeling scenario is encapsulated by a @right@ source in the @inputs@ configuration.

In terms of metrics, any comparisons of the modeling scenarios should accommodate:

  1. Skill scores where the numerator determines the scenario for which skill is being evaluated; and
  2. Metrics that are not skill scores, such as fractional bias, correlation, reliability diagrams and so on.

For example, in terms of the latter, it should be possible to display the correlation coefficient for two or more modeling scenarios in the same visualization.

Current behavior:

As of v1.0, it is only possible to compare verification results for one modeling scenario or @right@ source against a baseline (i.e. to compute skill for one scenario) in the graphical and numerical outputs from the WRES; that is, without requiring a user to build their own outputs from two or more separate configurations.

Implementation notes:

This requirement could be achieved by allowing two or more @right@ sources to be configured together or by requiring that individual modeling scenarios are configured separately.

There is an argument that such functionality is not be core to the WRES, and instead a requirement for some external visualization/output service, whereby a visualization/output tool assembles cached results from multiple modeling scenarios and generates a display for the combined output. On that basis, I am open to this feature request being rejected following discussion.

However, the choice of implementation may have direct implications for the comparison itself. For example, by configuring modeling scenarios separately and by aggregating results in a downstream visualization tool, there is presumably no scope to ensure that an apples-to-apples comparison is made between different modeling scenarios (i.e. that the same pairs - in time and space - are used across all scenarios). A user would need to be mindful of this, and would have no way to enforce it via configuration (since each configuration is independent of all other configurations).

This is likely to involve a substantial effort, since it has implications for configuration, storage, visualization and other things.


Redmine related issue(s): 102287


epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-09-26T18:49:52Z


I think it's a fair feature to afford.

I also think it would be best to keep the WRES core pipeline clean. One way to afford this feature is to wrap WRES on both ends such that multiple evaluations occur to fulfill the feature goals. In other words, we create yet another caller/consumer/client of WRES. We can write caller software in a separate module that runs two evaluations.

The change to WRES could be as extreme as ripping baseline stuff out of core WRES. Or if the baseline stuff can be cleanly modified to fulfill the feature (and avoid writing the outside wrappers), so be it.