NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I would like to compute time-to-peak errors for a subset of single-valued pairs that are identified as containing "peaks" #178

Open epag opened 3 months ago

epag commented 3 months ago

Author Name: James (James) Original Redmine Issue: 46360, https://vlab.noaa.gov/redmine/issues/46360 Original Date: 2018-02-08


Given a set of paired time-series (of predictions and verifying observations), timing error metrics may be applied to the unconditional set of time-series or to a subset of time-series that meet specific conditions. A subset may be defined in a variety of ways, such as:

This feature request is concerned with identifying paired time-series that contain a "peak" or a "peak above a threshold".

An example use case involves the calculation of time-to-peak errors, in order to evaluate the quality of a hydrologic forecast model at predicting the timing of flood peaks. Currently, the time-to-peak errors are computed for the unconditional set of paired time-series (i.e. all data). While this may provide some indication of timing errors in general, it will not specifically focus on time-series that contain peaks above a flood threshold. Thus, for this use case, a better indication of timing errors is likely to be obtained from the subset of paired time-series that contain a peak above the flood threshold.

Identifying a "local peak" or "high frequency peak" is an objective activity, since a local peak represents a stationary point or maximal turning point in some time-varying variable, such as streamflow. In simple terms, a local peak occurs where the value either side of the peak is lower than the value at the peak.

An example of local peak detection is provided by the @findpeaks@ method in Matlab.

https://uk.mathworks.com/help/signal/ref/findpeaks.html

However, local peak detection is likely to have only limited value in this context, since environmental variables, such as streamflow, are noisy variables. In other words, they typically contain a very large number of local peaks, depending on the scale of the data. For example, instantaneous data is inherently more noisy than daily aggregated data and will contain a large number of local peaks. For hydrologists, low-frequency peaks are typically more interesting, because they correspond to hydrologically relevant behavior, such as a flood wave.

"Non-local" peak detection is fundamentally more challenging than local peak detection for several reasons. In particular, when a time-series is truncated (e.g. by a forecast horizon), it will often be unclear whether that time-series contains a non-local peak. In the following example, @x@ might reasonably be identified as a non-local peak in scenario @(c)@, but not in scenarios @(a)@ and @(b)@.

!Peak.PNG!

Unless the time-horizon is augmented by additional data (e.g. by looking at observations or simulations across a larger time horizon), a non-local peak can only be defined in the context of the time horizon available.

The first question that arises, therefore, is whether non-local peaks should be identified from the paired data or from some other data, such as observations or hydrologic simulations that cover a longer period (including the period of the paired data).

A second question that arises concerns the definition of "non-local". Non-local could mean "global", for example, or "regional" or something else. A global peak is not simply the local peak with the maximum value of all local peaks. Rather, a "global peak" is concerned with "looking through" local peaks in the data to find a low-frequency pattern that can be identified as something more important (e.g. like detecting a "flood peak" from a noisy time-series of instantaneous streamflow).

Central to the definition of "non local" is signal processing theory or signal detection theory, for which an enormous body of literature exists.

One possibility for identifying "non local" peaks would be to apply a low-pass filter across the original data or to transform the signal into the frequency domain (Fast Fourier Transform) and remove the high-frequency components before recomposing the signal. Once filtered, a local peak detection algorithm could be applied. The degree and type of filtering would need to be determined and would be necessarily subjective, at least to some degree.

To make progress on this, we will probably need to identify a simple approach that can be augmented with better approaches in future. My preliminary recommendations are that we:

  1. Consider whether we want to use the paired data or some other data to identify "peaks". This could be the left source or an additional source (e.g. of observations or hydrologic simulations). The purpose would be to identify the range of times at which specific peaks occur. Once these times are known, it should be possible to select only those paired time-series that contain the qualifying time ranges or "peaks". In this way, a rising limb that ends with a local peak (cases @(a)@ and @(b)@ in the above diagram) would not be confused for a relevant peak. My preliminary recommendation is that we do not use the paired data directly, but instead consider the left source (or a simulated source) for the full period of record covered by all pairs.
  2. Consider what we mean by a "non-local" peak. Put differently, consider the type of low-pass filter or signal detection methodology that we want to apply to the data.
  3. Review existing methods within the NWS for peak detection, as well as the broader signal processing literature (including from hydrology). We should not identify candidate techniques simply from among those that the NWS has applied to date (whether for IOEP or something else), but we should certainly consider them. By way of example, I am attaching a spreadsheet from (I believe) Julie Meyer at MBRFC, which describes their approach. This description will probably require additional explanation from Alex.

This is a high priority task, that will need an initial implementation for SR1.

I am adding Xia and Yuqiong as watchers, as they are likely to have valuable input on a possible first approach for selecting time-series that contain relevant "peaks" for verification. We may want to add other watchers in due course (e.g. IOEP folks).


Redmine related issue(s): 44781, 44801, 70527


epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-06T20:00:33Z


James:

Thanks for putting out a nice description of peak analysis.

Regarding to your recommendations in the numbered Bulletins:

  1. I concur with you on this suggestion, that is to use all data to identify the peak and focus on the paired flow time series that cover the peak range.
  2. Do you have a specific low-pass filter in mind for detecting non-local peak? Or is it available in the WRES?
  3. A review of peak detection method would be very helpful. Have you searched the relevant literature, and if so, could you please share them with us? I also talked with Alex about the peak derivation method used by the MBRFC folks shown in the attached Excel file. The general concept is similar to what you mentioned: subset paired time-series and obtain threshold-based peaks if flow exceeds a threshold stage, such as forecast issuance, minor-, moderate-, major-flood and record stage. For each threshold level, the time-to-peak of forecasted AHPS locations within select watersheds stratified in terms of fast, moderate and slow response time are averaged to obtain one value representative of three categories of basins with different response times.

Best regards,

Xia

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-07T20:38:23Z


Xia.Feng wrote:

James:

Thanks for putting out a nice description of peak analysis.

Regarding to your recommendations in the numbered Bulletins:

  1. I concur with you on this suggestion, that is to use all data to identify the peak and focus on the paired flow time series that cover the peak range.
  2. Do you have a specific low-pass filter in mind for detecting non-local peak? Or is it available in the WRES?
  3. A review of peak detection method would be very helpful. Have you searched the relevant literature, and if so, could you please share them with us? I also talked with Alex about the peak derivation method used by the MBRFC folks shown in the attached Excel file. The general concept is similar to what you mentioned: subset paired time-series and obtain threshold-based peaks if flow exceeds a threshold stage, such as forecast issuance, minor-, moderate-, major-flood and record stage. For each threshold level, the time-to-peak of forecasted AHPS locations within select watersheds stratified in terms of fast, moderate and slow response time are averaged to obtain one value representative of three categories of basins with different response times.

Best regards,

Xia

Thanks, Xia!

(1) Sounds good. (2) No, not available in the WRES, and I don't have a particular preference, at least until I've engaged with the literature more... (3) Agreed.

I guess I'm looking for input from you and others about the methodology we should implement, initially. I think you're saying that, currently, you don't have a preferred methodology for filtering/detecting peaks, so we'll probably have to do some further reading and iterate our thoughts together.

However, time is rather limited for our upcoming release. Two things the WRES will definitely allow for Spiral Release 1 are the following:

  1. Calculation of time-to-peak errors that are defined "unconditionally", i.e. simply with reference to the time of the maximum value within the forecast horizon. This would be conducted for each paired time-series. The summary statistics would be unconditional, i.e. derived from the peak errors generated by +all+ paired time series (which could be constrained elsewhere in the project configuration, such as by using a start and an end date etc.).
  2. The ability for a user to provide only a limited set of sources to ingest. For example, if you have a particular sample of data in which the peaks have already been detected, these could be ingested as normal, and they would cover only the periods in which peaks occur. This basically leaves peak detection to the user.

Overall, this isn't ideal. It would be nice if we could agree an initial methodology for the system to detect peaks, because there will be a few different moving parts to getting this implemented. Time is definitely running out. One part is ingesting the data that we want to use for peak detection (i.e. not the paired data). Another is to implement the low-pass filter or other methodology we choose to define the discrete dates/times at which "peaks" occur and the period of time for which they last. It's the last part that's the tricky part...

epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-07T20:52:17Z


James:

Thanks for your reply.

I can conduct a quick survey for peak detection methodology (like low-pass filter, etc) if that is fine with you. Or if you have some reference paper that I can start with for searching.

Xia

On Wed, Mar 7, 2018 at 2:38 PM, <vlab.redmine[removed]> wrote:

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-07T22:19:33Z


Sounds good, thanks! I haven't compiled a reading list, so I think you'd be making a start.

epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-07T22:28:14Z


James: I will let you know how it goes. -Xia

On Wed, Mar 7, 2018 at 4:19 PM, <vlab.redmine[removed]> wrote:

epag commented 3 months ago

Original Redmine Comment Author Name: alexander.maestre (alexander.maestre) Original Date: 2018-03-08T13:41:47Z


Xia - At the same time you look for papers, could you please try to replicate the methodology included in the spreadsheet for another event (maybe hurricane Harvey)? We can contact Julie to make sure we follow the process. I think it is very useful because based on how fast a watershed responds, we can have estimated times to peak. I think the methodology used in the spreadsheet is very useful for emergency management personnel.

I am sending an e-mail to Julie to check if she knows about any papers about it.

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-08T14:07:38Z


alexander.maestre wrote:

Xia - At the same time you look for papers, could you please try to replicate the methodology included in the spreadsheet for another event (maybe hurricane Harvey)? We can contact Julie to make sure we follow the process. I think it is very useful because based on how fast a watershed responds, we can have estimated times to peak. I think the methodology used in the spreadsheet is very useful for emergency management personnel.

I am sending an e-mail to Julie to check if she knows about any papers about it.

Alex,

I think we need to decompose this spreadsheet a little, because it's doing several different things. I agree it's useful.

The first thing it does is classify basins. This is not related to timing errors, but is useful, of course. It would need to be reflected in the database and the WRES project configuration would need to be able to select on the basis of basin response class. Again, though, this is not related to timing errors, specifically.

The second thing it does is consider specific events. However, I don't believe it proposes a methodology for identifying time-series that contain peaks, right? This is the really challeneging problem we need to address. According to this spreadsheet, the time-series have been identified upfront as time-series of interest. That doesn't help us much. Am I misreading something?

The third thing is that it uses a particular type of metric, different from the one we currently have in the WRES. The WRES "time-to-peak error" measures the difference in time between the maximum values reached in the predicted versus observed time-series. The spreadsheet from Julie appears to use "time to first occurrence error", where "first occurrence" is the first time at which an event occurs, such as exceedence of the forecast issuance stage or the major flood stage. I think this is a useful metric too. We should add it to WRES. Again, there will be tons of useful timing metrics. I think we can do this for SR1. Agreed?

There may be other things too. I think we need to decompose the stuff that is unrelated to timing errors but nevertheless useful (basin response time classification) from the stuff that is related to timing errors.

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-08T14:20:16Z


Although I will also add that "time to first occurrence" may have some issues in terms of propriety and hedging.

For example, what is the "time-to-first-occurrence" error when there is no occurrence in one or both of the predicted and observed time-series? I think it's infinite, otherwise the metric could be hedged too easily (never predict an occurrence and the timing error is zero). However, this would also mean infinite values of the metric most of the time.

You would probably need to evaluate the metric for only those pairs of time-series where both left and right have at least one occurrence.

epag commented 3 months ago

Original Redmine Comment Author Name: alexander.maestre (alexander.maestre) Original Date: 2018-03-08T14:29:27Z


I think this can not be done for SR1 of course, but I think it has a lot of value because is a product that we produce as organization. Maybe we do not need to have it in JAVA. I want to get a grasp of how the methodology works and provide this tool to the RFCs. My understanding is that in this ticket the time to peak is not related to a single forecast. It seems to be related to multiple forecasts issued at multiple locations for a specific event.

We just need input from Xia to confirm that this is something useful for the evaluation services. My initial response is yes, because it is needed by all the RFCs every time there is a big storm that we know is going to occur ahead of time.

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-08T14:43:39Z


alexander.maestre wrote:

I think this can not be done for SR1 of course, but I think it has a lot of value because is a product that we produce as organization. Maybe we do not need to have it in JAVA. I want to get a grasp of how the methodology works and provide this tool to the RFCs. My understanding is that in this ticket the time to peak is not related to a single forecast. It seems to be related to multiple forecasts issued at multiple locations for a specific event.

We just need input from Xia to confirm that this is something useful for the evaluation services. My initial response is yes, because it is needed by all the RFCs every time there is a big storm that we know is going to occur ahead of time.

Yes, I think they compute an average error. But that average comes from somewhere, i.e. from the errors of the individual paired time-series, I think.

I agree that we need to understand the methodology better. For example, I want to know how they're treating time-series in which one or both (observed or forecast) do not contain an occurrence.

In evaluating forecast quality, it's important to use measures that tell a story worth knowing. In other words, "propriety" is super important. There are many papers that show the consequences of not using proper metrics to evaluate forecast quality, especially for extreme events, like flooding. I guess the most famous case is the Finley tornado forecasts:

https://www.nssl.noaa.gov/users/brooks/public_html/feda/papers/murphy96.pdf

Anyway, we need to understand it a little more before implementing it.

epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-08T15:18:09Z


Hey, Alex and James:

Thanks for the useful discussions on the approach outlined in the Excel file Julie provided. I also noted that the spreadsheet doesn't elaborate how the peak was derived and "time-to-peak" was not compared with the obs. It's definitely worthy of experimenting with the method but a better understanding is needed. Alex already emailed Julie (cc'ed to me as well) about further documentation of the method.

In the meantime, I will search the papers on the streamflow peak detection methodology which is one critical procedure for the WRES peak error analysis.

Best regards,

Xia

On Thu, Mar 8, 2018 at 8:43 AM, <vlab.redmine[removed]> wrote:

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-13T16:54:51Z


Adding a note from Xia sent via e-mail:

Xia.Feng wrote:

James and Alex:

I did a review for the methods to obtain peak flow and summarized my findings at

https://drive.google.com/drive/folders/1EQt9d-aaQziocWAyjsoYNILgCARpmrXT?usp=sharing

There are a plethora approaches to estimate peak flow for the gaged and ungaged streams with regression, rational formula, Manning equation, and Peaks Over Threshold, etc. However, I am unable to find one study using the low-pass filtering to obtain the peak flow.

Anyway, listed several methods for noise filtering, and low-pass filtering baseflow.

If such paper can't be retrieved, I think we need to come up with a methodology tailed for us to smooth the flow time series and obtain peaks. There are some functions in R and MATLAB that can achieve this goal and are listed on the final slide. Not sure if we could explore them further.

Best regards,

Xia

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-13T16:58:21Z


Thanks for the above Xia, I will take a look shortly :)

One thing to note about "low pass filtering". This is a rather generic term and may not be mentioned in a lot of papers explicitly. It covers a vast array of smoothing approaches. For example, a simple moving average is a low pass filter. I believe Sam Lamont was using a moving average in his methodology, for example (although perhaps I am recalling incorrectly).

epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-15T17:32:09Z


*Alex and I had a meeting with Julie yesterday to go through the spreadsheet Julie created. She estimated the lead times used for issuing flood warnings at five flood categories and the crest stage over fifty-three AHPS forecast locations during the period of 28 April-6 May

  1. This is a posteriori analysis, i.e., conducted after May 6 2017. The five thresholds are obtained from historical observational records and include forecast issuance stage, minor flood stage, moderate flood stage, major flood stage and record stage. The lead time was defined as the time difference between the basistime of the first forecast that indicates that the threshold stage was exceeded (in any of the valid times according to the Integrated Hydrologic Forecast System (IHFS)) and the first observation that exceeded the same threshold value. Additionally, the lead times of the forecast points stratified in terms of slow, medium and fast response time were averaged. The statistical measures of MAE and ME were calculated for the averaged lead times of three groups of the varied response time.In essence, this approach identifies the time duration between the first forecast and first observation overshooting the reference level. The peak flow and timing of peak flow and their comparisons with the observations were not covered. *

On Tue, Mar 13, 2018 at 11:58 AM, <vlab.redmine[removed]> wrote:

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-15T18:14:42Z


Xia.Feng wrote:

*Alex and I had a meeting with Julie yesterday to go through the spreadsheet Julie created. She estimated the lead times used for issuing flood warnings at five flood categories and the crest stage over fifty-three AHPS forecast locations during the period of 28 April-6 May

  1. This is a posteriori analysis, i.e., conducted after May 6 2017. The five thresholds are obtained from historical observational records and include forecast issuance stage, minor flood stage, moderate flood stage, major flood stage and record stage. The lead time was defined as the time difference between the basistime of the first forecast that indicates that the threshold stage was exceeded (in any of the valid times according to the Integrated Hydrologic Forecast System (IHFS)) and the first observation that exceeded the same threshold value. Additionally, the lead times of the forecast points stratified in terms of slow, medium and fast response time were averaged. The statistical measures of MAE and ME were calculated for the averaged lead times of three groups of the varied response time.In essence, this approach identifies the time duration between the first forecast and first observation overshooting the reference level. The peak flow and timing of peak flow and their comparisons with the observations were not covered. *

Thank you, Xia, for the very useful summary.

I think this essentially confirms our expectations about what they did. The important part is that it's a post-event analysis, so the event was predefined. There's no methodology that helps us with peak selection. Right?

However, there are elements of the methodology that we probably do need to implement in the WRES. These include the ability to ingest information about basin response times/classes and condition the verification results on those classes. This applies not just to timing errors but to all cases. We should address that separately.

The other element of the methodology is the metric/s they use, which you've usefully described. As you say, they are measuring the timing error associated with the first occurrence above a threshold. The "time-to-first-occurrence error".

I think the difficulty with using this information is how to account for cases where the predictions and observations differ about whether there +is+ a (first) occurrence. I think you'd need an infinite penalty for making a wrong prediction, i.e. predicting above when the observation always predicts below or vice versa. If you didn't do this, any scoring rule would be "improper", i.e. very easy to hedge. This is extremely important, because hedging isn't just a theoretical thing, it can happen inadvertently (e.g. when a model is biased).

For example, if the penalty were zero for not predicting above the threshold when the observation exceeded the threshold (or vice versa), the best possible score is obtained by always predicting below the threshold. This is nonsense, of course. I think it really only works for cases where both the predictions and observations are above the threshold. However, the latter is also misleading, because it filters the results to cases where the model is categorically correct (i.e. correctly predicts the exceedence).

Right?

Separately, we still need to work out how best to filter the peaks from a longer dataset. I'm going to read your summary in the next day or so and I'll then respond to that part.

Thanks again.

James

epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-15T21:00:14Z


James:

Alex also contributed to the summary. Indeed, there is useful message coming out from the spreadsheet.

It's a good idea to split the peak flow related accuracy measures according to the response time of the upstream catchments. This kind of performance information is critical for issuing early and effective flood warning and thus is helpful for flood preparedness and hazard mitigation.

The methodology treats the forecast as "false alarm" if the forecast wrongly predicts the flow exceedence. Another possibility is that the forecast could miss the occurrence of exceedence as you mentioned. For these two special situations, the approach will not be able to estimate "lead time" or "time-to"first-occurrence error". Basically, it applies to the cases where both the forecasted and observed flow exceed the specified threshold level.

I guess we need to work up a solution for our purpose.

Xia

On Thu, Mar 15, 2018 at 1:14 PM, <vlab.redmine[removed]> wrote:

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-15T21:10:38Z


OK - thanks to Alex, also!

I agree. I think we need to work up a solution to the specific problem of peak flow detection. More on that soon...

I also agree about the need for the basin response time classes. I will create a ticket for that and cc you and Alex. I'm not sure about the extent to which we can obtain this information from external sources.

I can add a "time-to-first-occurrence error" quite easily, but I think we agree that it would only apply to cases where occurrences are found in both the predictions and observations. In such circumstances, it would need to be interpreted very carefully, because there could be large timing errors where the conditions is not met (i.e. where the predictions or observations do not meet the condition).

James

epag commented 3 months ago

Original Redmine Comment Author Name: alexander.maestre (alexander.maestre) Original Date: 2018-03-15T21:35:15Z


You are welcome! The information that Julie is using is in the NRLDB or IHFS database that is the database we are using in IOEP. So in short. I have those tables or we can have access if we ask Chip.

The information on the spreadsheet is to calculate the lead time emergency managers had to respond to an event or to a certain flooding stage. I think it is super useful and can see how this can be easily programmed using the IOEP database. The spreadsheet is for a posteriori verification of the lead time provided by the forecasting system to the public.

Agree with your comment about "time-to-first-occurrence error" that is how we can check that there were false positives or false negatives based on the River Forecasts (RVF)

Alex

epag commented 3 months ago

Original Redmine Comment Author Name: Xia (Xia) Original Date: 2018-03-16T15:27:08Z


Thanks James and Alex for the further thoughts.

I also inserted several representative reference articles and links in the review slides (#2)

https://drive.google.com/drive/folders/1EQt9d-aaQziocWAyjsoYNILgCARpmrXT?usp=sharing

Most of the approaches is to estimate peak flow for gaged/ungauged streams instead of detecting the peak value from the existing flow series. We have Sam Lamont and Julie's methods for reference. Plus there are plenty of low-pass filtering and peak detection techniques used out there. We will need to formulate the most appropriate methodology for us.

Xia

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-03-16T15:27:45Z


As above, I've added a new feature request to filter/qualify outputs by class of feature, such as basin response class, #47886.

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-04-26T11:12:32Z


See post #44781-25 for an update on progress.

I'm going to place this on hold and set to backlog until some concrete decisions have been made about any direction for more sophisticated approaches. Before implementing any of these approaches, it would be good to get some feedback about the SR1 options. I read the PPT from Xia with interest (thanks, Xia :)), and I think more follow-up will be required. Some of these techniques require statistical modeling of the time-series data and go beyond traditional verification (which aims to measure, rather than model). I need to read some of the references provided.

Also, see #44801, including the PPT from Alex. This is more akin to traditional verification and something along these lines is a likely next step, although probably with some modifications.

epag commented 3 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-09-26T12:04:05Z


Making this a separate task to move forward, not a subtask of #44781. Task #44781 will provide closure for the first implementation of timing error metrics.