TheDroplets / ensverif

A toolbox of metrics to assess the performance of ensemble simulations or forecasts
GNU General Public License v3.0
2 stars 1 forks source link

crps decomposition not computed correctly in the presence of outliers #1

Open vinfort opened 10 months ago

vinfort commented 10 months ago

In the presence of outliers, the function crps_hersbach_decomposition does not implement the method described in Hersbach (2000) in order to correctly compute the reliability and potential of CRPS. More precisely, equation (33) of the paper is not implemented. After correcting this, I've been able to match the results obtained using the function crpsDecomposition in the R package verification on CRAN. I have sent a patch to @QueenMABhydro by email (it's way too late for me to properly fork the code and submit a pull request). The proposed patch follows, for the record.

vinfort commented 10 months ago
diff --git a/ensverif.py b/ensverif.py
index 045433d..2d2e2bf 100644
--- a/ensverif.py
+++ b/ensverif.py
@@ -180,6 +180,9 @@ def crps_hersbach_decomposition(ens, obs):
     alpha = np.zeros((rows,columns+1))
     beta = np.zeros((rows,columns+1))

+    n_outliers_left = 0
+    n_outliers_right = 0
+
     for i in range(rows):
         # if the observation does not exist, no ens for alpha and beta
         if ~np.isnan(obs[i]):
@@ -189,6 +192,7 @@ def crps_hersbach_decomposition(ens, obs):
                     if obs[i] < ensemble_sort[0]:
                         alpha[i,k] = 0
                         beta[i,k] = ensemble_sort[0] - obs[i]
+                        n_outliers_left = n_outliers_left + 1
                     else:
                         alpha[i,k] = 0
                         beta[i,k] = 0
@@ -196,6 +200,7 @@ def crps_hersbach_decomposition(ens, obs):
                     if obs[i] > ensemble_sort[columns-1]:
                         alpha[i,k] = obs[i] - ensemble_sort[columns-1]
                         beta[i,k] = 0
+                        n_outliers_right = n_outliers_right + 1
                     else:
                         alpha[i,k] = 0
                         beta[i,k] = 0
@@ -221,8 +226,14 @@ def crps_hersbach_decomposition(ens, obs):
     beta1 = np.nanmean(beta, axis=0)

     g_component = alpha1 + beta1
+    g_component[g_component==0] = np.nan
     o_component = beta1 / g_component

+    o_component[0] = n_outliers_left / rows
+    g_component[0] = beta1[0] / o_component[0]
+    o_component[columns] = n_outliers_right / rows
+    g_component[columns] = alpha1[columns] / (1. - o_component[columns])
+
     weight = np.arange(columns+1) / columns
     reliability_component = np.nansum(g_component * np.power(o_component - weight, 2))
     potential_component = np.nansum(g_component * o_component * (1 - o_component))
vinfort commented 10 months ago

By the way I think that it should not be necessary to use nanmean to compute alpha1 and beta1 because there should be no nans in the alpha and beta arrays. Because of the use of nanmean the code was silently failing and the bug went unnoticed. I suspect the nanmean might have been added at some point because the code was actually failing!

vinfort commented 10 months ago

There's actually a problem with the solution I proposed: it fails when o_component[0] == 0 i.e. when there are not left outliers! In this case we obtain g_component[0] == Inf. Because we later multiply g_component by o_component to compute the two components of CRPS, it really does not matter what value we assign to g_component[0] when o_component[0] equals zero, but it should not be infinity! Zero is as good a value as any. Here's an updated patch:

vinfort commented 10 months ago
diff --git a/ensverif.py b/ensverif.py
index 045433d..324af2b 100644
--- a/ensverif.py
+++ b/ensverif.py
@@ -180,6 +180,9 @@ def crps_hersbach_decomposition(ens, obs):
     alpha = np.zeros((rows,columns+1))
     beta = np.zeros((rows,columns+1))

+    n_outliers_left = 0
+    n_outliers_right = 0
+
     for i in range(rows):
         # if the observation does not exist, no ens for alpha and beta
         if ~np.isnan(obs[i]):
@@ -189,6 +192,7 @@ def crps_hersbach_decomposition(ens, obs):
                     if obs[i] < ensemble_sort[0]:
                         alpha[i,k] = 0
                         beta[i,k] = ensemble_sort[0] - obs[i]
+                        n_outliers_left = n_outliers_left + 1
                     else:
                         alpha[i,k] = 0
                         beta[i,k] = 0
@@ -196,6 +200,7 @@ def crps_hersbach_decomposition(ens, obs):
                     if obs[i] > ensemble_sort[columns-1]:
                         alpha[i,k] = obs[i] - ensemble_sort[columns-1]
                         beta[i,k] = 0
+                        n_outliers_right = n_outliers_right + 1
                     else:
                         alpha[i,k] = 0
                         beta[i,k] = 0
@@ -221,8 +226,14 @@ def crps_hersbach_decomposition(ens, obs):
     beta1 = np.nanmean(beta, axis=0)

     g_component = alpha1 + beta1
+    g_component[g_component==0] = np.nan
     o_component = beta1 / g_component

+    o_component[0] = n_outliers_left / rows
+    g_component[0] = 0 if o_component[0] == 0 else beta1[0] / o_component[0]
+    o_component[columns] = n_outliers_right / rows
+    g_component[columns] = alpha1[columns] / (1. - o_component[columns])
+
     weight = np.arange(columns+1) / columns
     reliability_component = np.nansum(g_component * np.power(o_component - weight, 2))
     potential_component = np.nansum(g_component * o_component * (1 - o_component))
vinfort commented 10 months ago

Only only line of code has been changed this time:

diff --git a/ensverif.py b/ensverif.py
index 2d2e2bf..324af2b 100644
--- a/ensverif.py
+++ b/ensverif.py
@@ -230,7 +230,7 @@ def crps_hersbach_decomposition(ens, obs):
     o_component = beta1 / g_component

     o_component[0] = n_outliers_left / rows
-    g_component[0] = beta1[0] / o_component[0]
+    g_component[0] = 0 if o_component[0] == 0 else beta1[0] / o_component[0]
     o_component[columns] = n_outliers_right / rows
     g_component[columns] = alpha1[columns] / (1. - o_component[columns])
vinfort commented 10 months ago

Never trust a patch generated at 3am!

QueenMABhydro commented 10 months ago

Hi Vincent, Thanks for raising this issue, and for your efforts in solving it. I will look at all the patches and update ensverif to solve this issue with outliers. However, I also want to update the Matlab version and would prefer to release both at the same time. So unless there is a dramatically urgent need, I will only push the updated Python version when the Matlab version is also ready and when both have been verified.