equinor / iterative_ensemble_smoother

Algorithms for data assimilation using ensemble methods.
GNU General Public License v3.0
17 stars 12 forks source link

A scaling of the equations should always be done to ensure a well-conditioned inverse #117

Closed tommyod closed 11 months ago

tommyod commented 1 year ago

In this line we scale the covariance matrix to a correlation matrix:

https://github.com/equinor/iterative_ensemble_smoother/blob/24d771149134299102f8770c397b670c1d1efbf8/src/iterative_ensemble_smoother/utils.py#L120

And here we scale D and E:

https://github.com/equinor/iterative_ensemble_smoother/blob/24d771149134299102f8770c397b670c1d1efbf8/src/iterative_ensemble_smoother/_iterative_ensemble_smoother.py#L124

H is the right hand side of equation (42), but for the equation to hold, the same scaling must be done on both sides of the equation. Is this really the case here?


Back to the sentence. I don't understand "A scaling of the equations should always be done to ensure a well-conditioned inverse". What type of scaling? Does it always improve the conditioning? How so?

Consider solving $(S^T S + D)x = S^T$ as in eqn (51), where $D$ is diagonal. We could scale it to $(D^{-1/2} S^T S D^{-1/2} + I) (D^{1/2} x) = D^{-1/2} S^T$, solve that for $y = D^{1/2} x$ and then solve for $x$. Is this what is meant? I'm not sure.

Anyway, the condition number of the left hand side matrix does not always improve. Here is a counterexample:

# Create matrices S.T and D
import numpy as np
sT = np.arange(3) + 1
D = np.diag([3, 2, 1])

print(np.linalg.cond(np.outer(sT, sT) + D)) # 9.299360

# Create D^{-1/2}
D_inv_sq = np.diag(1/np.sqrt(np.diag(D)))
print(np.linalg.cond(D_inv_sq @ np.outer(sT, sT) @ D_inv_sq + np.eye(3))) # 12.33333

In the 2012 Emerick paper he proposes rescaling using the cholesky factorization in appendix A1. But this is different from (1) scaling by dividing by standard deviations and (2) scaling by transforming covariance matrices to correlation matrices. The reason for his proposal is to not truncate small singular values in the SVD when measurements are of different orders of magnitude. Which seems like a different reason than "ensuring a well conditioned inverse".

Thoughts @dafeda ? I think you mentioned something about this once.

If not I will investigate this in the future.

tommyod commented 11 months ago

I will close this.

In #136, the scaling has been documented in detail in the code. It's still part of the code, but much more understandable now IMO. Closing this, since much of the initial confusion is gone.