Open garrettwrong opened 3 years ago
Picture was generated by running tutorials/examples/cov2d_simulation.py
after setting shrinker=None and noise_var=0`.
Yunpeng suggested he was able to get considerably better results for this problem using least squares instead of solve
(gesv). He should be sending me some code to consider for a PR so we can close it out with more success.
That suggests there is some ill-conditioning here. Using LS might be a too blunt of a tool. Just to be clear, we're talking about this line https://github.com/ComputationalCryoEM/ASPIRE-Python/blob/28ebc69b7d54088a64d3569fe1b4fc893138074d/src/aspire/covariance/covar2d.py#L443 correct?
I think so. The patch line he sent me is just changing the underlying call for all block diagonal solves, which is what will be used in that line (sig_noise_covar_coeff
is block diag instance).
What I will try to do later this morning is change the solve
to admit a solver
and just focus on overriding the one suspected call. (Changing solve
to least squares wholesale is not a fix).
I don't know if that's a good idea. It would be better to get to the bottom of why this is happening in the first place. Like I said the issue is likely due to some eigenvalues of the matrix being close to zero. If so, it should be a matter of tracking it to where those zero eigenvalues are coming from and finding a way around them.
Ideally, we'd want to keep the interface as close to np.linalg.solve
, which doesn't do more than apply the inverse (if I understand correctly). If we'd want to add a pseudo-inverse (which is what I assume the LS solve would do), it would be better to separate it into a different function. The first thing should be to find out what's causing this, though.
Okay, I understand. So the idea was that somebody else was already spending time doing that (finding out what's causing it and providing solution). Unfortunately they not involved in this discussion or actually contributing code. I'm going to drop off this issue and try to clarify some things later. We'll get back to it.
Placing this email with the actual discussion. The first section categorizes the prior work presented at our lunch meeting and the second restates the proposed implementation above.
Hi Joakim,
I hope you are doing well. I saw your recent comment on the issue of ill-conditioned covariance matrix in ASPIRE. It seems to me that the problem comes after the covariance estimation. For example, I tested SNR=10^20, and I use ground truth covariance matrix for the wiener filtering function, and the NRMSE is >0.7 (when using pseudo-inverse the error reduces to <0.05). This indicates that even when using the oracle covariance matrix, there is some numerical instability. The reason is that the ground truth covariance is approximately low rank (which makes sense, as clean images are projections of a 3D volume), or in other words, ill-conditioned. Thus, if you estimate the covariance matrix accurately (under high SNR), you are expected to get an ill-conditioned matrix. Note that this is not an issue for lower SNR, due to the regularization term \sigma^2 * I in the expression of Hi in (16) of https://arxiv.org/pdf/1602.06632.pdf. However, when we take smaller and smaller parameter sigma for the wiener filtering, there starts to be an issue, and pseudo-inverse has to be used. Note that even if we use pseudo-inverse, there would still be an error (<0.05) due to discretization. This error will be magnified if $A\Sigma A^T + \sigma^2 I$ is ill-conditioned and ordinary inverse is applied.
It looks to me that a reasonable solution is to let "get_cwf_coeffs" call a separate function that uses the pseudo-inverse. Or have an "if" statement that checks the condition number of the covariance matrix (or even simply check SNR), and have two cases that respectively use the inverse and pseudo-inverse. Let me know what you think.
Best, Yunpeng
EDIT: Reformatted quote.
Ok. I think we agree on the cause of the problem: a non-invertible covariance matrix. Where we differ is how to solve this. I don't think constructing a pseudo-inverse here is a good idea for two reasons. First, what we actually want here is a minimum-norm solution, if I understand correctly. If so, we want to write this in a way that's clear. While calculating a pseudo-inverse would work, it would obscure what the actual problem is and would overkill for what we want to achieve. Second, a pseudo-inverse is more expensive and will be unnecessary most of the time (when the resulting covariance matrix is full rank), especially since we only apply it once.
There are two ways to get around the pseudo-inverse. First, this could simply be an issue of small but non-zero values in the noise_var
parameter. In other words, we currently check it against zero here https://github.com/ComputationalCryoEM/ASPIRE-Python/blob/28ebc69b7d54088a64d3569fe1b4fc893138074d/src/aspire/covariance/covar2d.py#L437 but maybe the appropriate thing to do is test it against machine epsilon (your SNR of 10^20 will probably give a very small noise variance that is not zero). So one simple fix would be to replace noise_var == 0
with something like noise_var < tol
, where tol
may be a parameter or set to np.finfo(noise_var.dtype).eps
.
The second (potentially more stable) approach is to compute an eigendecomposition of the clean covariance matrix (covar_coeff
in the code), identify the non-trivial eigenvectors and the subspace they span. The CTF matrices (ctf_fb_k
) can then be projected into that subspace along with the noisy coefficients (coeff_k
) and we can do all the work in there. This does however, assume that the CTF matrices are always full rank (which I suspect is not always the case and may be playing a role here, but that remains to be verified).
Neither of these will work if the CTF matrix is singular, though. It would be good to verify if this is the case before proceeding. If we do have a singular CTF matrix, we could do an eigendecomposition of that, but then we're getting into the same level of computational effort as for the pseudo-inverse.
We should revisit this now that some bugs in cov2d have been fixed just to be sure we aren't chasing a red herring here.
In #420 and #421 some small changes were made to run cov2d with small/zero noise_var. Previously it would crash with div0 or later Singular matrix etc etc. Most of those changes relate to
shrinker
, butget_cwf_coeffs
also required a change.It appears I may not have implemented the correction for CTF correctly because we yield a higher error than for the noisy case, and the error looks sort of like CTF.
_Originally posted by @janden in https://github.com/ComputationalCryoEM/ASPIRE-Python/pull/421#discussion_r642618429_