A scaling of the equations should always be done to ensure a well-conditioned inverse

In this line we scale the covariance matrix to a correlation matrix:

https://github.com/equinor/iterative_ensemble_smoother/blob/24d771149134299102f8770c397b670c1d1efbf8/src/iterative_ensemble_smoother/utils.py#L120

And here we scale D and E:

https://github.com/equinor/iterative_ensemble_smoother/blob/24d771149134299102f8770c397b670c1d1efbf8/src/iterative_ensemble_smoother/_iterative_ensemble_smoother.py#L124

Why do we do this? It is because of the sentence "A scaling of the equations should always be done to ensure a well-conditioned inverse" just below Eqn (51)? Is there some other reason?

Do we have control that this does what we want? We scale D before passing it to create_coefficient_matrix, where H is defined as:

H := D + S * W 
= scale(E + observation_values - response_ensemble) + S * W
= scale(center(E) + observation_values - response_ensemble) + S * W

H is the right hand side of equation (42), but for the equation to hold, the same scaling must be done on both sides of the equation. Is this really the case here?

I want to investigate this.
If it checks out, it should be re-written so all scaling happens in one place. A property that should hold is solve(scale(arg1), scale(arg2)) = solve(arg1, arg2). This can and should tested.

Back to the sentence. I don't understand "A scaling of the equations should always be done to ensure a well-conditioned inverse". What type of scaling? Does it always improve the conditioning? How so?

Consider solving $(S^T S + D)x = S^T$ as in eqn (51), where $D$ is diagonal. We could scale it to $(D^{-1/2} S^T S D^{-1/2} + I) (D^{1/2} x) = D^{-1/2} S^T$, solve that for $y = D^{1/2} x$ and then solve for $x$. Is this what is meant? I'm not sure.

Anyway, the condition number of the left hand side matrix does not always improve. Here is a counterexample:

# Create matrices S.T and D
import numpy as np
sT = np.arange(3) + 1
D = np.diag([3, 2, 1])

print(np.linalg.cond(np.outer(sT, sT) + D)) # 9.299360

# Create D^{-1/2}
D_inv_sq = np.diag(1/np.sqrt(np.diag(D)))
print(np.linalg.cond(D_inv_sq @ np.outer(sT, sT) @ D_inv_sq + np.eye(3))) # 12.33333

In the 2012 Emerick paper he proposes rescaling using the cholesky factorization in appendix A1. But this is different from (1) scaling by dividing by standard deviations and (2) scaling by transforming covariance matrices to correlation matrices. The reason for his proposal is to not truncate small singular values in the SVD when measurements are of different orders of magnitude. Which seems like a different reason than "ensuring a well conditioned inverse".

Thoughts @dafeda ? I think you mentioned something about this once.

If not I will investigate this in the future.

equinor / iterative_ensemble_smoother

A scaling of the equations should always be done to ensure a well-conditioned inverse #117