As a user, I would like to be able to compare forecasts from two different periods of record - Githubissues

epag commented 2 months ago

Author Name: James (James) Original Redmine Issue: 80830, https://vlab.noaa.gov/redmine/issues/80830 Original Date: 2020-07-15

Given two sets of forecasts from different periods of record When I want to compare the quality of those two sets of forecasts Then I should be able to do this with the WRES

( Presently, I think this is not possible or, at least, straightforward. It may be possible for "one big pool" (i.e., no explicit pools defined), but it would not be possible with explicit pools, which are assumed to apply equally to both sets of forecasts. )

Use case arose from a question during the WRES training on 15 July 2020.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-07-15T18:07:06Z

Added Seann, who asked the question.

epag commented 2 months ago

Original Redmine Comment Author Name: Seann (Seann) Original Date: 2020-07-16T23:56:24Z

Hi,

I'm not sure that I stated my question very well.

My initial use case was to be able routinely update the statistics for stations as our period of record grows. For example, we might want to know the RMSE of all forecasts above flood stage since we started issuing forecasts (say ten years) and update that information each month moving forward (I think I used 6 months as my example on the call). I think someone on the call mentioned that my real issue is with the performance, which is probably true. If it is fast to recalculate the statistics for 10 or 20 years that solves my main problem.

I was wondering if the expensive part of the calculation was downloading the data and forming the pairs, so the idea of saving 10 years of pairs and then only having to regenerate one month of pairs was part of my question.

While not my initial thought, comparing forecasts for two different periods of record would also be useful to understand things like model changes, landuse changes, or climate changes.

Thanks. Seann

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-07-17T10:13:32Z

I see, thanks for clarifying, Seann.

We're working through a similar user story for WPOD who also want to compute statistics for rolling periods that are updated regularly, albeit for much shorter periods in their case. The COWRES is designed to provide evaluations "on demand" and, while it does archive data internally for re-use, it is not designed to provide a long-term archive of time-series data. So I agree with you that the aim should be to provide quicker evaluations in general. That way, the cost of calling the service many times and asking for new evaluations is not too high.

Providing a common storage space (e.g., an object store) "near" to the COWRES would allow for time-series to be updated incrementally for ingest, which might reduce the overall time spent if you are otherwise shipping all data to the COWRES every time. We could also consider maintaining the internal archive of pairs for longer (again supporting re-use, but with the cost of slowing down evaluations in general). However, that does not avoid the need to acquire and (re)read the data (to determine whether it has already been ingested). Another option would be to provide instances (or database instances) for particular users/groups, although that has downsides too.

As an aside, there are ways to compute statistics incrementally by decomposing them into additive components. For example, the sum of squared errors has one additive component, so the sum of squared errors across multiple evaluations is simply the sum of the sum of squared errors from each of them. For more complex statistics, one-pass algorithms generally don't work, but the statistics can always be re-written in terms of several additive components. In principle, it would be possible to generate these additive components as statistics in their own right and then to post-process/aggregate them across multiple evaluations without loss of accuracy (I mean, not in a way that is transparent to users, but via some sort of service API where it was seamless). But that is a lot of effort and I think we should focus on improving performance for all users in simple ways before we consider complicating the design.

NOAA-OWP / wres

As a user, I would like to be able to compare forecasts from two different periods of record #231