Use of general eigenvalues solver in weyl_coordinates sometimes throws convergence error

levbishop commented 2 years ago

It looks like the optimizations in #6896 can sometimes cause problems with convergence. @albertzhu01 found several cases that triggered this problem, although there seems to be some environment-dependence that makes it hard to reproduce across machines - the resulting almost-identity-matrix up to 1.E-16 scale deviations can cause error in lapack geev.

Any thoughts @jakelishman ?

---- from qiskit-experiments issue https://github.com/Qiskit/qiskit-experiments/issues/846#issuecomment-1205772386_

The following test case reliably reproduces the error:

import numpy as np
from scipy import linalg as la
mat = np.array(
    [
        [0.9999999999999998j, 0j, -3.469446951953614e-18j, 0j], 
        [0j, 0.9999999999999998j, (8.673617379884035e-19+2.6020852139652106e-18j), -6.938893903907228e-18j], 
        [-3.469446951953614e-18j, (8.673617379884035e-19+2.6020852139652106e-18j), 0.9999999999999998j, 0j], 
        [0j, -6.938893903907228e-18j, 0j, 0.9999999999999998j]
    ]
)
la.eigvals(mat)

Output:

LinAlgError                               Traceback (most recent call last)
Input In [17], in <cell line: 9>()
      1 from scipy import linalg as la
      2 mat = np.array(
      3     [
      4         [0.9999999999999998j, 0j, -3.469446951953614e-18j, 0j], 
   (...)
      7         [0j, -6.938893903907228e-18j, 0j, 0.9999999999999998j]]
      8 )
----> 9 la.eigvals(mat)

File ~/opt/anaconda3/envs/pygsti2/lib/python3.8/site-packages/scipy/linalg/_decomp.py:880, in eigvals(a, b, overwrite_a, check_finite, homogeneous_eigvals)
    811 def eigvals(a, b=None, overwrite_a=False, check_finite=True,
    812             homogeneous_eigvals=False):
    813     """
    814     Compute eigenvalues from an ordinary or generalized eigenvalue problem.
    815 
   (...)
    878 
    879     """
--> 880     return eig(a, b=b, left=0, right=0, overwrite_a=overwrite_a,
    881                check_finite=check_finite,
    882                homogeneous_eigvals=homogeneous_eigvals)

File ~/opt/anaconda3/envs/pygsti2/lib/python3.8/site-packages/scipy/linalg/_decomp.py:247, in eig(a, b, left, right, overwrite_a, overwrite_b, check_finite, homogeneous_eigvals)
    244     w = wr + _I * wi
    245     w = _make_eigvals(w, None, homogeneous_eigvals)
--> 247 _check_info(info, 'eig algorithm (geev)',
    248             positive='did not converge (only eigenvalues '
    249                      'with order >= %d have converged)')
    251 only_real = numpy.all(w.imag == 0.0)
    252 if not (geev.typecode in 'cz' or only_real):

File ~/opt/anaconda3/envs/pygsti2/lib/python3.8/site-packages/scipy/linalg/_decomp.py:1356, in _check_info(info, driver, positive)
   1353     raise ValueError('illegal value in argument %d of internal %s'
   1354                      % (-info, driver))
   1355 if info > 0 and positive:
-> 1356     raise LinAlgError(("%s " + positive) % (driver, info,))

LinAlgError: eig algorithm (geev) did not converge (only eigenvalues with order >= 2 have converged)

Originally posted by @albertzhu01 in https://github.com/Qiskit/qiskit-experiments/issues/846#issuecomment-1205772386

jlapeyre commented 2 years ago

A crude hack is: trap the exception and check if the code returned (info) satisfies info>0. If so, add normally distributed random numbers with standard deviation 1e-20 to the matrix and try to compute the eigenvalues again. That hack works in this case.

It could be that it's not just that the matrix is almost the identity. The degeneracy in the eigenvalues, or the diagonal, is necessary to cause the error... in this case at least. Changing one of the diagonal entries so that it ends in 9 rather than 8 is enough to see success.

In this case, multiplying the matrix by $\sqrt{-1}$ also avoids the error.

levbishop commented 2 years ago

A crude hack is: trap the exception and check if the code returned (info) satisfies info>0. If so, add normally distributed random numbers with standard deviation 1e-20 to the matrix and try to compute the eigenvalues again. That hack works in this case.

The changes in #6896 were intended to avoid the randomized algorithm in TwoQubitWeylDecomposition, which after a lot of battle-hardening seems finally 🤞 immune to such pathologies. I guess rather than trapping the error and ad-hoc randomizing, I'd rather trap the error and fall back to the TwoQubitWeylDecomposition alogorithm. Although having two paths like this worries me, and if the general eigensolver is unreliable maybe its better to simply use the TwoQubitWeylDecomposition algorithm everywhere unless the performance hit is substantial.

jlapeyre commented 2 years ago

I think the use of randomization is quite different. And pretty small in terms of code and complexity. But, I understand that introducing another piece of randomness might result in a series of tweaks as failing cases trickle in under general use.

If there are understandable characteristics of offending matrices, we could make some kind of test suite in order to be more confident (if not perfectly) that a solution will work. I suppose a small subset of these would go in the big test suite.

EDIT: There already is something like this

The terra testsuite exercises a bunch of the pathological cases. It would be interesting to see if the testsuite passes on your setup. Something like python -m unittest test.python.quantum_info.test_synthesis -cb should run the relevant pieces

jlapeyre commented 2 years ago

Apropos environment-specific features: The example above succeeds when using MKL (on my machine). I'm not 100% sure because I used Julia (I didnt look up yet how to use mkl with scipy.linalg). But, it looks like python and julia are calling the same lapack routines. Furthermore, with openblas, Julia fails with the same error as numpy.

mtreinish commented 11 months ago

It's definitely a function of the environment beyond just the specific blas implementation. On my linux system I do not encounter any failures with openblas, but on my m1 mac also using openblas it does reliably fail.

mtreinish commented 3 days ago

Just to follow up here, we've moved most of the two qubit synthesis code to rust and the eigensolver in faer seems seems to be immune from this issue. I have generally observed that running on arm64 mac numpy/openblas has a lot of these numeric issues, besides this specific error we've seen a lot of weird numeric stability issues and failures on arm64 (I suspect it might be an apple clang issue, but lack the time and knowledge to really dig into it) and moving the numerics to rust has been a pretty blunt tool to avoid them. That being said @jlapeyre wrote the weyl_coordinates function in rust already, but we're not leveraging it from python yet, although the only remaining use of it is for tests. I'll push up a small PR to remove the scipy implementation and just expose the rust function to python and have that close this issue.

jlapeyre commented 3 days ago

It looks like we could move the stuff in local_invariance.py to Rust. Then test this from Rust rather than Python. But exposing weyl_coordinates from Python is reasonable in the meantime.

Qiskit / qiskit

Use of general eigenvalues solver in weyl_coordinates sometimes throws convergence error #8459