`CollectCliffords` pass fails for circuit with large `PauliLindbladError`

aeddins-ibm commented 6 days ago

Environment

Qiskit version: 1.3.0rc1
Python version: 3.11
Operating system: macos

What is happening?

qiskit.transpiler.passes.CollectCliffords goes through a circuit and merges recognized clifford gates into combined Clifford operations.

However, this fails if we've appended a many-qubit PauliLindbladError (qiskit_aer.noise.errors.PauliLindbladError) to the circuit. The PauliLindbladError is represented in the circuit as a quantum_channel instruction. At some point, CollectCliffords attempts to convert this channel to a Kraus object, which involves creating a many-qubit SuperOp which naturally crashes with this error: ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

The attempt to convert to a Kraus object apparently happens here: https://github.com/Qiskit/qiskit/blob/cd26d498c9b74475753b9179ac2b7f84122ffe81/qiskit/dagcircuit/collect_blocks.py#L168 It looks like it encounters the PauliLindbladError (as a quantum_channel) as a successor node (of something), and tries to use that quantum_channel as a dictionary key (suc). Attempting to check for equality with other dict keys apparently involves converting the quantum_channel to a many-qubit SuperOp, which fails.

How can we reproduce the issue?

from qiskit import QuantumCircuit
from qiskit.transpiler.passes import CollectCliffords
from qiskit.transpiler.passmanager import PassManager
from qiskit_aer.noise.errors import PauliLindbladError

pm = PassManager(CollectCliffords(
    do_commutative_analysis = False,
    split_blocks = False,
    split_layers = False,
    collect_from_back = False,
))

qc = QuantumCircuit(20)
qc.cx(0,1)
qc.cx(0,1)
qc.append(PauliLindbladError(generators=['X'+'I'*19],rates=[0.1]),range(20))

qc_transpiled = pm.run(qc)

What should happen?

Personally I don't want CollectCliffords to attempt to do anything with the quantum_channel instructions. It should leave those instructions unchanged, without raising any errors.

Any suggestions?

It looks like internally, the low-level collect functions use a filter_fn in order to ignore some instructions, avoiding the dictionary lookup: https://github.com/Qiskit/qiskit/blob/cd26d498c9b74475753b9179ac2b7f84122ffe81/qiskit/dagcircuit/collect_blocks.py#L163-L172

Can we modify the existing filter_fn used by CollectCliffords so that it ignores quantum_channel instructions?

alexanderivrii commented 1 day ago

Thanks @aeddins-ibm for the initial analysis. Apparently, hashing PauliLindbladError nodes in DAGCircuit requires expanding their definitions (causing the error above). This should be fixed by #13484 which changes the internal indexing data structure to use nodes' integer indices rather than nodes themselves.

However, here is a question, cc'ing @jakelishman and @mtreinish. It does not seem that the PauliLindbladError class coming from qiskit-aer inherits from Operation (please correct me if I'm am wrong). Even though the above code now works, doesn't this break the contract of what nodes can be added to a DAGCircuit? And if it does break, how do we go about it?

mtreinish commented 1 day ago

This is coming from the Rust DAGCircuit implementation. In the Rust DAGCircuit we're no longer storing a DAGNode python space object in the dag anymore, but instead generating the object on the fly from the rust data. This change would potentially break a lot of user code because prior to #12550 object identity was used for for equality. Since we're generating new DAGNode objects on each python space access that returns a node object identity checks obviously would no longer work. To offset that in #12550 I had to add a equality and hash implementation for DAGNodes so that we can continue to use equivalent nodes as keys in a dict or set members without any issue even if they're not the same exact object. Typically this isn't an issue because all the members of a DAGNode are hashable. The issue here is arguably an aer bug if the __eq__ implementation aren't valid/viable for many qubit PauliLindbladError objects. You'd have the same error if you did PauliLinbladError(...) == PauliLinbladError(...)

As for whether this is an API break, I view it as more of a grey area. I don't think we've ever formally defined that an Operation object needs to have __eq__ or be hashable. I think we make that assumption at several layers of the stack that we can compare operations. I think we'll have issues at several levels if an operation __eq__ implementation isn't valid.

Qiskit / qiskit