Open vinfort opened 1 year ago
diff --git a/ensverif.py b/ensverif.py
index 045433d..2d2e2bf 100644
--- a/ensverif.py
+++ b/ensverif.py
@@ -180,6 +180,9 @@ def crps_hersbach_decomposition(ens, obs):
alpha = np.zeros((rows,columns+1))
beta = np.zeros((rows,columns+1))
+ n_outliers_left = 0
+ n_outliers_right = 0
+
for i in range(rows):
# if the observation does not exist, no ens for alpha and beta
if ~np.isnan(obs[i]):
@@ -189,6 +192,7 @@ def crps_hersbach_decomposition(ens, obs):
if obs[i] < ensemble_sort[0]:
alpha[i,k] = 0
beta[i,k] = ensemble_sort[0] - obs[i]
+ n_outliers_left = n_outliers_left + 1
else:
alpha[i,k] = 0
beta[i,k] = 0
@@ -196,6 +200,7 @@ def crps_hersbach_decomposition(ens, obs):
if obs[i] > ensemble_sort[columns-1]:
alpha[i,k] = obs[i] - ensemble_sort[columns-1]
beta[i,k] = 0
+ n_outliers_right = n_outliers_right + 1
else:
alpha[i,k] = 0
beta[i,k] = 0
@@ -221,8 +226,14 @@ def crps_hersbach_decomposition(ens, obs):
beta1 = np.nanmean(beta, axis=0)
g_component = alpha1 + beta1
+ g_component[g_component==0] = np.nan
o_component = beta1 / g_component
+ o_component[0] = n_outliers_left / rows
+ g_component[0] = beta1[0] / o_component[0]
+ o_component[columns] = n_outliers_right / rows
+ g_component[columns] = alpha1[columns] / (1. - o_component[columns])
+
weight = np.arange(columns+1) / columns
reliability_component = np.nansum(g_component * np.power(o_component - weight, 2))
potential_component = np.nansum(g_component * o_component * (1 - o_component))
By the way I think that it should not be necessary to use nanmean to compute alpha1 and beta1 because there should be no nans in the alpha and beta arrays. Because of the use of nanmean the code was silently failing and the bug went unnoticed. I suspect the nanmean might have been added at some point because the code was actually failing!
There's actually a problem with the solution I proposed: it fails when o_component[0] == 0
i.e. when there are not left outliers! In this case we obtain g_component[0] == Inf
. Because we later multiply g_component by o_component to compute the two components of CRPS, it really does not matter what value we assign to g_component[0] when o_component[0] equals zero, but it should not be infinity! Zero is as good a value as any. Here's an updated patch:
diff --git a/ensverif.py b/ensverif.py
index 045433d..324af2b 100644
--- a/ensverif.py
+++ b/ensverif.py
@@ -180,6 +180,9 @@ def crps_hersbach_decomposition(ens, obs):
alpha = np.zeros((rows,columns+1))
beta = np.zeros((rows,columns+1))
+ n_outliers_left = 0
+ n_outliers_right = 0
+
for i in range(rows):
# if the observation does not exist, no ens for alpha and beta
if ~np.isnan(obs[i]):
@@ -189,6 +192,7 @@ def crps_hersbach_decomposition(ens, obs):
if obs[i] < ensemble_sort[0]:
alpha[i,k] = 0
beta[i,k] = ensemble_sort[0] - obs[i]
+ n_outliers_left = n_outliers_left + 1
else:
alpha[i,k] = 0
beta[i,k] = 0
@@ -196,6 +200,7 @@ def crps_hersbach_decomposition(ens, obs):
if obs[i] > ensemble_sort[columns-1]:
alpha[i,k] = obs[i] - ensemble_sort[columns-1]
beta[i,k] = 0
+ n_outliers_right = n_outliers_right + 1
else:
alpha[i,k] = 0
beta[i,k] = 0
@@ -221,8 +226,14 @@ def crps_hersbach_decomposition(ens, obs):
beta1 = np.nanmean(beta, axis=0)
g_component = alpha1 + beta1
+ g_component[g_component==0] = np.nan
o_component = beta1 / g_component
+ o_component[0] = n_outliers_left / rows
+ g_component[0] = 0 if o_component[0] == 0 else beta1[0] / o_component[0]
+ o_component[columns] = n_outliers_right / rows
+ g_component[columns] = alpha1[columns] / (1. - o_component[columns])
+
weight = np.arange(columns+1) / columns
reliability_component = np.nansum(g_component * np.power(o_component - weight, 2))
potential_component = np.nansum(g_component * o_component * (1 - o_component))
Only only line of code has been changed this time:
diff --git a/ensverif.py b/ensverif.py
index 2d2e2bf..324af2b 100644
--- a/ensverif.py
+++ b/ensverif.py
@@ -230,7 +230,7 @@ def crps_hersbach_decomposition(ens, obs):
o_component = beta1 / g_component
o_component[0] = n_outliers_left / rows
- g_component[0] = beta1[0] / o_component[0]
+ g_component[0] = 0 if o_component[0] == 0 else beta1[0] / o_component[0]
o_component[columns] = n_outliers_right / rows
g_component[columns] = alpha1[columns] / (1. - o_component[columns])
Never trust a patch generated at 3am!
Hi Vincent, Thanks for raising this issue, and for your efforts in solving it. I will look at all the patches and update ensverif to solve this issue with outliers. However, I also want to update the Matlab version and would prefer to release both at the same time. So unless there is a dramatically urgent need, I will only push the updated Python version when the Matlab version is also ready and when both have been verified.
In the presence of outliers, the function crps_hersbach_decomposition does not implement the method described in Hersbach (2000) in order to correctly compute the reliability and potential of CRPS. More precisely, equation (33) of the paper is not implemented. After correcting this, I've been able to match the results obtained using the function crpsDecomposition in the R package verification on CRAN. I have sent a patch to @QueenMABhydro by email (it's way too late for me to properly fork the code and submit a pull request). The proposed patch follows, for the record.