SparsePauliOp.to_matrix() exhibits numerical instabilities

donsano33 commented 2 weeks ago

Environment

Qiskit version: 1.2.4
Python version: 3,11
Operating system: Ubuntu 24.04.1 LTS

What is happening?

When calculating the matrix representation of a SparsePauliOp consisting of multiple terms via to_matrix() , numbers that should be exactly zero are not zero but something around e-18. This does not happen with qiskit<1.

How can we reproduce the issue?

from qiskit.quantum_info import SparsePauliOp

factor = 0.025
terms = [
    factor * SparsePauliOp(data=["IIII", "IIZI", "IIIZ", "IIZZ"], coeffs=[1,-1,-1, 1]),
    factor * SparsePauliOp(data=['IIII', 'IZII', 'IIZI', 'IZZI'], coeffs=[1,-1,-1, 1])
    ]
op = SparsePauliOp.sum(terms)
op_matrix = op.to_matrix(sparse=True)
print(op_matrix.diagonal())

This also appears when using sparse=False.

What should happen?

In qiskit<1.0 the output looks like this:

[0.  0.  0.  0.1
0.  0.  0.1 0.2
0.  0.  0.  0.1
0.  0.  0.1 0.2]

with qiskit>1.0, e.g. I use qiskit==1.2.4 the output looks somewhat like this:

[ 6.9388939e-18  6.9388939e-18 -6.9388939e-18  1.0000000e-01
  0.0000000e+00  0.0000000e+00  1.0000000e-01  2.0000000e-01
  6.9388939e-18  6.9388939e-18 -6.9388939e-18  1.0000000e-01
  0.0000000e+00  0.0000000e+00  1.0000000e-01  2.0000000e-01]

Interestingly the positions of the small numbers changes when I rerun the same script with a new call. If I run the statements in a loop the positions of the small numbers is constant.

Any suggestions?

Just a first guess by looking at the implementation of SparsePauliOp.to_matrix() the issue might either be in this function

https://github.com/Qiskit/qiskit/blob/cd26d498c9b74475753b9179ac2b7f84122ffe81/qiskit/quantum_info/operators/symplectic/sparse_pauli_op.py#L956

or rather in both of the to_matrix_sparse()

https://github.com/Qiskit/qiskit/blob/cd26d498c9b74475753b9179ac2b7f84122ffe81/qiskit/quantum_info/operators/symplectic/sparse_pauli_op.py#L965

and to_matrix_dense accelerated functions which get imported.

https://github.com/Qiskit/qiskit/blob/cd26d498c9b74475753b9179ac2b7f84122ffe81/qiskit/quantum_info/operators/symplectic/sparse_pauli_op.py#L968

jakelishman commented 2 weeks ago

The number 0.025 isn't exactly representable in floating-point arithmetic - the closest value to it is actually the huge fraction $\frac{3602879701896397}{144115188075855872}$. That means that "correct" result of the matrix calculation of a sum isn't necessarily exactly what you expect, and floating-point arithmetic is such that the order that additions are done in can affect things. The fact that there aren't zeros isn't a bug, but a side-effect of the new algorithm for calculating the matrices (which is several hundreds of times faster than the previous method, especially for large sparse matrices) is that we first simplify the operator before doing the sum, to reduce the amount of work we have to do (though it doesn't affect your case), and I think that can cause the terms to turn up in a non-deterministic order that's set at the time of import qiskit. The other possibility is that the distribution of the matrix calculation onto threads might be shifting the orders of things - if you set force_serial=True in the to_matrix call, that might also stabilise it (but I don't think so in this case).

Assuming the non-determinism is what I suggested, we can look to fix the order of evaluation there. If I'm right, it should be as simple as swapping out the HashMap in MatrixCompressedPaulis::combine into an indexmap::IndexMap. That could be handled as a bug fix.

As a feature request, we could add a "tolerance" field to to_matrix to manually set a threshold to suppress small terms.

The fact that Qiskit pre-1.0 was giving exact zeros for your matrix before is most likely a bit of chance - the trouble with inexact floating-point numbers and accumulation of round-off error is fundamental to numerical computing.

donsano33 commented 2 weeks ago

Thanks for the quick reply. The number 0.025 was actually just taken as a representative example. In our actual usecase the factor is not even constant and usually has a lot more digits after the comma, but I get your point.

As you already expected, using force_serial=True does not change the behavior.

It would really nice being able to specify a "tolerance", at which all values are floored to exactly zero.

jakelishman commented 2 weeks ago

Sure, no worries! The reason I mentioned the inexact floating-point number is that it's not really specific to 0.025, that's just representative of the problem. It has a chance of happening with any coefficients that aren't exactly integer multiples of powers of two - the number of digits when written in base 10 doesn't matter. If you tried it with 0.25, you should find it's entirely stable always (for those two operators) because that's exactly representable in floating-point, and it never accumulates round-off error by adding it to itself. Just to illustrate what's going on, in raw Python:

>>> 0.025 + 0.025 + 0.025 - 0.025 - 0.025 - 0.025
6.938893903907228e-18
>>> 0.025 - 0.025 + 0.025 - 0.025 + 0.025 - 0.025
0.0

It so happens that 0.025 + 0.025 in double-precision floating point is also represented by the closest floating-point number to 0.05, but both floating-point numbers are truly ever so slightly higher (by about $1.39\times10^{-18}$ and $2.78\times10^{-18}$ respectively). Then 0.05 + 0.025 in double-precision floating point ends up being ever so slightly closer to $0.07500000000000001$ than to $0.075$ (or at least the closest floats to those two values). So if you add together three 0.025s before subtracting one, you get a round-off accumulation, but if you do 0.025 - 0.025 summed three times, you keep cancelling it out - the order of operations matters numerically, when in "real" maths it doesn't.

The tolerance argument is a feature, and we can do that in a future release (probably Qiskit 2.0 now, since 1.3 closed for new features two weeks ago). Fixing the stability of the order of operations is something we can do in a patch release, though - that's the IndexMap thing I mentioned above.

donsano33 commented 1 week ago

Thanks again for the wonderful explanation, very insightful.

What are the next steps? The fix for the stability of the order of operations can happen in this bugticket, right? For the feature request, shall I create a new one or would you handle that?

Qiskit / qiskit