Implement Cross-Validation for Stability Assessment of P50, P10, and P90 Estimates Using Model Realizations

dafeda commented 1 year ago

The ensemble of model realizations is commonly used to estimate statistics such as P50, P90, and P10. To assess the reliability of these estimates, I propose implementing a cross-validation method that uses different subsets of realizations. This method can provide insight into the stability of the estimates and help determine if additional realizations are needed.

Here's how the cross-validation method can be implemented:

Split the ensemble of model realizations into several subsets. You can do this by random sampling without replacement, making sure that each subset has a sufficient number of realizations. Note that the subset may have to be of size 1 (i.e., leave-one-out cross-validation) if the number of realizations is limited.
For each subset of realizations, calculate the P50, P90, and P10 estimates.
Compare the P50, P90, and P10 estimates across the different subsets. Calculate the variation in these estimates using measures like the coefficient of variation (CV) or standard deviation (SD). Low variation in the estimates across subsets indicates stable estimates and suggests that you have a sufficient number of realizations.
If the variation in the P50, P90, and P10 estimates is high, consider increasing the number of realizations in your ensemble. If the variation is low, you can be more confident that you have a sufficient number of realizations to achieve stable estimates.

By using cross-validation to assess the stability of P50, P90, and P10 estimates, you can gain insights into the reliability of the ensemble and ensure that you have sufficient realizations to achieve stable estimates.

Blunde1 commented 1 year ago

Having access to measures on uncertainty for these statistics should be valuable, as points 3. and 4. above illustrates.

I would default to think of the Bootstrap (sampling with replacement, equivalent to sampling from the empirical distribution) over cross validation when wanting to understand the uncertainty of statistics. This given that computational power is not a problem, which it should not be here, and that the statistic is not calculated in some mysterious way (quantiles are okay) or is very high dimensional.

In the case that the statistic is on a time-series, I guess realizations should be sampled (with replacement) fully, and not individual time-points. Otherwise smooth confidence bands on the time-series statistic will not be obtained.

dafeda commented 1 year ago

Good points @Blunde1 👍

equinor / ert

Implement Cross-Validation for Stability Assessment of P50, P10, and P90 Estimates Using Model Realizations #5135