As a user, I would like to compare verification metrics across two or more modeling scenarios (e.g. from two separate models) - Githubissues

Author Name: James (James) Original Redmine Issue: 51673, https://vlab.noaa.gov/redmine/issues/51673 Original Date: 2018-06-18

Expected behavior:

As a user, I would like to compare verification results from two or more modeling scenarios (whether forecasts or simulations) in the same context (outputs, visualizations). Here, a modeling scenario might include (among other things):

The same model configured or calibrated in two or more different ways
The same model with two or more different sources or forcing
Two or more different models with the same forcing

A modeling scenario is encapsulated by a @right@ source in the @inputs@ configuration.

In terms of metrics, any comparisons of the modeling scenarios should accommodate:

Skill scores where the numerator determines the scenario for which skill is being evaluated; and
Metrics that are not skill scores, such as fractional bias, correlation, reliability diagrams and so on.

For example, in terms of the latter, it should be possible to display the correlation coefficient for two or more modeling scenarios in the same visualization.

Current behavior:

As of v1.0, it is only possible to compare verification results for one modeling scenario or @right@ source against a baseline (i.e. to compute skill for one scenario) in the graphical and numerical outputs from the WRES; that is, without requiring a user to build their own outputs from two or more separate configurations.

Implementation notes:

This requirement could be achieved by allowing two or more @right@ sources to be configured together or by requiring that individual modeling scenarios are configured separately.

There is an argument that such functionality is not be core to the WRES, and instead a requirement for some external visualization/output service, whereby a visualization/output tool assembles cached results from multiple modeling scenarios and generates a display for the combined output. On that basis, I am open to this feature request being rejected following discussion.

However, the choice of implementation may have direct implications for the comparison itself. For example, by configuring modeling scenarios separately and by aggregating results in a downstream visualization tool, there is presumably no scope to ensure that an apples-to-apples comparison is made between different modeling scenarios (i.e. that the same pairs - in time and space - are used across all scenarios). A user would need to be mindful of this, and would have no way to enforce it via configuration (since each configuration is independent of all other configurations).

This is likely to involve a substantial effort, since it has implications for configuration, storage, visualization and other things.

Redmine related issue(s): 102287

NOAA-OWP / wres

As a user, I would like to compare verification metrics across two or more modeling scenarios (e.g. from two separate models) #188