casadoj / EFAS_skill

Assessment of the skill of EFAS (Europen Flood Awareness System) formal flood notifications.
0 stars 0 forks source link
copernicus-emergency-management-services efas floods forecasting skill-assessment warnings

EFAS skill assessment

Analysis of the skill of EFAS (Europen Flood Awareness System) formal flood notifications since the deployment of EFAS v4 (October 2020).

1 Structure of the repository

The repository contains seven folders:

2 Data

The analysis is limited to the EFAS fixed reporting points with a catchment area larger than 500 km² (2357 points).

The original datasets used for the study are:

3 Methods

The whole analysis consists of 4 (5) major steps:

3.1 Preprocess the discharge reanalysis

This step is carried out in this notebook. The "observed" discharge time series for each reporting point is compared against its defined return period ($Q_{rp}$) to produce time series of exceedance over threshold. In principle, the time series of exceedance should be binary (0, non-exceedance; 1, exceedance); however, to allow for minor deviations between "observed" and forecasted discharge, a reducing factor ($\lambda$) can be used to create ternary time series of exceedance:

Parameters in the configuration file specifically involved in this step:

3.2 Preprocess the forecast discharge

This notebook preprocesses the discharge forecasts. The objective is the same as in the previous step, i.e., to create a data set of exceedances over threshold, but in this case for the forecasts. The procedure is, however, a bit more complex since it involves overlapping forecasts from 4 numerical weather predictors (NWP) that, in some cases, have several runs (members) in every forecast.

As in the reanalysis, the output of the forecast preprocessing are NetCDF files with the time series of exceedance over threshold. Depending on whether the reducing_factor is enabled or not, the NetCDF files will contain one or two variables: the exceedance over the discharge threshold ($Q{rp}$), and, if applicable, the exceedance over the reduced discharge threshold ($\lambda \cdot Q{rp}$). In any case, the dataset contains values in the range 0-1 with the proportion of model runs (members) that exceeded the specific discharge threshold. For the deterministic NWP (DWD and ECMWF-HRES) values can only be either 0 or 1.

Parameters in the configuration file specifically involved in this step:

3.3 Confusion matrix

This notebook compares the exceedance over threshold for both the reanalyses (observation) and the forecast, and computes the entries of the confusion matrix (hits, misses, false alarms) that will be later on used to compute skill.

Figure 1. Confusion matrix for an imbalanced classification, such as that of flood forecasting.

Figure 1. Confusion matrix for an imbalanced classification, such as that of flood forecasting.

The first step in this section is to reshape the forecast exceedance matrix. Originally this matrix has, for each station and NWP model, the dimensions forecast (in date and time units and a frequency of 12 hours) and leadtime (in hours with frequency 6 hours). These dimensions cannot be directly compared with the datetime dimension in the reanalysis dataset (date and time units and a frequency of 6 hours). Hence, the forecast dataset needs to be reshaped into two new dimensions: datetime (same units and frequency as datetime in the reanalysis data) and leadtime (in hours but with frequency 12 h, instead of 6 h as originally). A thorough explanation of this step can be found in this document.

If the exceedance datasets are ternary (see Sections 3.1 and 3.2), the second step in this section is to recompute the exceedance to convert these ternary datasets into binary. The combination of 2 ternary time series has 9 possible outcomes. In a nutshell, only two cases are interesting: when one of the time series is over the discharge threshold ($Q{rp}$) and the other one is just over the reduced discharge threshold ($\lambda \cdot Q{rp}$). These two cases would be either a miss or a false alarm in a binary analysis; instead, in the ternary analysis they will be both considered as hits.

The third step is to compute total exceedance probability out of the probabilities from each of the 4 NWP. Four aproaches are tested:

Forecasted events (i.e. notifications) are computed by comparing the total exceedance probability matrix against a vector of possible probability thresholds. It is in this step that we include persistence as a notification criteria. The forecasted events are calculated for the series of persistence values specified in the configuration file.

Finally, the hits, misses, and false alarms are computed from the comparison between the "observed" and the forecasted events. The results are saved as NetCDF file, one for each reporting point. Every NetCDF file contains 3 matrixes ($TP$ for true positives or hits, $FN$ for false negatives of misses, $FP$ for false positives or false alarms) with 4 dimensions (approach, probability, persistence, leadtime).

Parameters in the configuration file specifically involved in this step:

3.4 Selection of reporting points

In a first attempt, we tried to remove the spatial colinearity between reporting points. The idea was that the reporting points in the same catchment might be highly correlated, so including all of them in the skill analysis would not be correct. With that idea in mind, there is a notebook that analyses the reporting points in a catchment basis and filters out highly correlated points.

In the end, this step has been removed from the pipeline due to the limited amount of data that we have, which would be even smaller if we removed more reporting points.

This filter could be done in order to keep either smaller or larger catchments, in either case, this filter would have hinder the skill analysis based on catchment area that will be part of the final results.

3.5 Skill assessment

This is the notebook in which we analyse the skill of EFAS notifications in the last 2 years and derive ways of changing the notification criteria in order to optimize skill. The outcome of this process is a set of plots and a few datasets including the optimized criteria and the table of reporting points including their skill for the optimial criteria.

In this section we compute skill out of the hits, misses and false alarms derived in the previous section. Skill is measured in three different ways: $recall$, $precision$ and a combination of those named $f{score}$. The $\beta$ coefficient in the $f{score}$ is one of the parameters to be set in the configuration file. The default values is 1, for which the same importance is given to both $precision$ and $recall$. If $precision$ is deemed more importance, $\beta$ should be lower than 1, and the other way around if $recall$ is more important.

$$recall = \frac{TP}{TP + FN}$$

$$precision = \frac{TP}{TP + FP}$$

$$f_{beta} = \frac{(1 + \beta^2) \cdot TP}{(1 + \beta^2) \cdot TP + \beta^2 \cdot FN + FP}$$

Two plots are generated that show, respectively, the evolution of the hits and the skill regarding persistence, lead time, probability threshold and approach. A third plot shows especifically the evolution of skill for the fixed lead time (default 60 h) and catchment area (default 2000 km²) that will be used in the optimization.

After the previous exploration, the criteria are optimized for a fixed lead time and catchment area. A new set of criteria is derived for each of the approaches used to compute total exceedance probability (see Section 3.3). With these new sets of criteria maps and lineplots are generated to show the results and improvements compared to the current notification criteria.

Finally, we analyse the behaviour of the skill with varying catchment area (for a fixed lead time) and varying lead time (for a fixed catchment area). Not only we compare the new optimal criteria against the current, but we rerun a optimization in which we look for the optimal probability threshold for each cathcment/lead time value. The objective of this second optimization is only exploratory, to check whether there is ground for improvement in the skill of the system with more complex notification criteria.

Parameters in the configuration file specifically involved in this step:

There is a final notebook that imports the datasets of hits and the optimized criteria and exports a table that summarizes the results of the analysis.

3.6 Extras

There are 2 extra notebooks that were used to explain the whole procedure and generate plots regarding specific events.

4 Results

This Confluence page is a report of the complete study, including the analysis of the results. A PDF version can be found in the folder docs.