Add an efficient circuit synthesis for a special pattern `exp(-it (IZZ + ZZI + ZZZ))`

What should we add?

[1] proposed an efficient way to synthesize the following special pattern of two-body terms (IZZ and ZZI) and three-body term (ZZZ) (Fig. 2 [1]). It can halve the number of two-qubit gates (8 to 4). This pattern is useful to handle QAOA with cubic terms as [1] did.

It would be nice if Qiskit can handle this special pattern of synthesis.

Target (coeffs can be arbitrary)

     ┌────────────────────────────────┐
q_0: ┤0                               ├
     │                                │
q_1: ┤1 exp(-it (IZZ + ZZI + ZZZ))(x) ├
     │                                │
q_2: ┤2                               ├
     └────────────────────────────────┘
# PauliEvolutionGate(SparsePauliOp(["IZZ", "ZZI", "ZZZ"], coeffs=[1, 2, 3]), x)

Qiskit optimization_level=3

                                                        ┌───┐┌───────────┐┌───┐
q_0: ──■─────────────────■──────────────────────────────┤ X ├┤ Rz(6.0*x) ├┤ X ├─────
     ┌─┴─┐┌───────────┐┌─┴─┐                       ┌───┐└─┬─┘└───────────┘└─┬─┘┌───┐
q_1: ┤ X ├┤ Rz(2.0*x) ├┤ X ├──■─────────────────■──┤ X ├──■─────────────────■──┤ X ├
     └───┘└───────────┘└───┘┌─┴─┐┌───────────┐┌─┴─┐└─┬─┘                       └─┬─┘
q_2: ───────────────────────┤ X ├┤ Rz(4.0*x) ├┤ X ├──■───────────────────────────■──
                            └───┘└───────────┘└───┘
OrderedDict({'cx': 8, 'rz': 3})

Proposed circuit by [1]

q_0: ──■───────────────────────────────────■────────────────────
     ┌─┴─┐┌───────────┐┌───┐┌───────────┐┌─┴─┐┌───────────┐┌───┐
q_1: ┤ X ├┤ Rz(2.0*x) ├┤ X ├┤ Rz(6.0*x) ├┤ X ├┤ Rz(4.0*x) ├┤ X ├
     └───┘└───────────┘└─┬─┘└───────────┘└───┘└───────────┘└─┬─┘
q_2: ────────────────────■───────────────────────────────────■──
OrderedDict({'cx': 4, 'rz': 3})

Reference [1] Pelofske, E., Bärtschi, A., & Eidenbenz, S. (2024). Short-depth QAOA circuits and quantum annealing on higher-order ising models. Npj Quantum Information, 10(1), 30. https://doi.org/10.1038/s41534-024-00825-w

Script to reproduce

from qiskit import QuantumCircuit, generate_preset_pass_manager
from qiskit.circuit import Parameter
from qiskit.circuit.library import PauliEvolutionGate
from qiskit.quantum_info import SparsePauliOp

# PauliEvolutionGate
qc = QuantumCircuit(3)
x = Parameter("x")
op = SparsePauliOp(["IZZ", "ZZI", "ZZZ"], coeffs=[1, 2, 3])
evo = PauliEvolutionGate(op, x)
qc.append(evo, qargs=qc.qregs[0])
print("PauliEvolutionGate")
print(qc)
print("PauliEvolutionGate + decompose")
print(qc.decompose())
pm = generate_preset_pass_manager(optimization_level=3, basis_gates=["cx", "rz"])
qc = pm.run(qc)
print("PauliEvolutionGate + optimization_level=3")
print(qc)

# Hand-optimized
qc2 = QuantumCircuit(3)
qc2.rzz(2.0 * x, 0, 1)
# RZZZ(6 * x, 0, 1, 2)
qc2.cx(0, 1)
qc2.rzz(6.0 * x, 2, 1)
qc2.cx(0, 1)
#
qc2.rzz(4.0 * x, 2, 1)
print("Hand-optimized")
print(qc2)

print("Hand-optimization + optimization_level=3")
qc3 = pm.run(qc2)
print(qc3)

This seems like a very nice optimisation for this kind of class of PauliEvolutionGate. Can we think about what the most general form of this decomposition that we could implement is, too? I haven't read the paper, but just judging by the structure of the output circuit, I'm assuming that we can make a similarly valid pattern for any evolution of the form

$$ \biggl(\sum{i\in \text{qubits}} c{1,i} Pi\biggr) + \biggl(\sum{i\ne0} c_{2,i} P_0 P_i \biggr) + c_3 P_0 P_1 P_2 $$

for any single Pauli $P$ (the subscript denotes the qubit), and any permutation of the "centre" qubit (here denoted $q_0$).

As in: I feel like a similar trick would work if these were all X rotations instead of Z, it should be impervious to commuting linear terms being in the mix, and it clearly should be valid under permutation of the qubits. If we add this case, I feel like we should make sure it validly recognises all those additional places too. That might not mean putting in separate catches for the linear terms - maybe the synthesis works out cleaner if we separate those out into a prefix (since they all trivially commute with all other operators).

Going further, is it possible that for evolution of any polynomial of the same Pauli to use this cubic case as a decomposition to lift the synthesis to higher orders? It might not be immediately useful, but it feels like there might be some sort of decomposition rule possible here, since that kind of polynomial is fairly trivially detectable as "everything commutes".

If I wanted to chat further, I could just copy my reply into the LLM myself. Please don't spam issues with LLM output.

how do I start helping?

Thank you for your idea of generalization, @jakelishman. It sounds great. I tried X and it works well too. I agree with the idea to add the most general form.

PauliEvolutionGate
     ┌────────────────────────────────┐
q_0: ┤0                               ├
     │                                │
q_1: ┤1 exp(-it (IXX + XXI + XXX))(x) ├
     │                                │
q_2: ┤2                               ├
     └────────────────────────────────┘

PauliEvolutionGate + optimization_level=3
     ┌───┐                                                   ┌───┐┌───────────┐┌───┐┌───┐
q_0: ┤ H ├──■─────────────────■──────────────────────────────┤ X ├┤ Rz(6.0*x) ├┤ X ├┤ H ├─────
     ├───┤┌─┴─┐┌───────────┐┌─┴─┐                       ┌───┐└─┬─┘└───────────┘└─┬─┘├───┤┌───┐
q_1: ┤ H ├┤ X ├┤ Rz(2.0*x) ├┤ X ├──■─────────────────■──┤ X ├──■─────────────────■──┤ X ├┤ H ├
     ├───┤└───┘└───────────┘└───┘┌─┴─┐┌───────────┐┌─┴─┐└─┬─┘                       └─┬─┘├───┤
q_2: ┤ H ├───────────────────────┤ X ├┤ Rz(4.0*x) ├┤ X ├──■───────────────────────────■──┤ H ├
     └───┘                       └───┘└───────────┘└───┘                                 └───┘

Hand-optimization + optimization_level=3
     ┌───┐                                             ┌───┐
q_0: ┤ H ├──■───────────────────────────────────■──────┤ H ├──────────────
     ├───┤┌─┴─┐┌───────────┐┌───┐┌───────────┐┌─┴─┐┌───┴───┴───┐┌───┐┌───┐
q_1: ┤ H ├┤ X ├┤ Rz(2.0*x) ├┤ X ├┤ Rz(6.0*x) ├┤ X ├┤ Rz(4.0*x) ├┤ X ├┤ H ├
     ├───┤└───┘└───────────┘└─┬─┘└───────────┘└───┘└───────────┘└─┬─┘├───┤
q_2: ┤ H ├────────────────────■───────────────────────────────────■──┤ H ├
     └───┘                                                           └───┘

import numpy as np
from qiskit import QuantumCircuit, generate_preset_pass_manager
from qiskit.circuit import Parameter
from qiskit.circuit.library import PauliEvolutionGate
from qiskit.quantum_info import SparsePauliOp, Statevector, Operator

# PauliEvolutionGate
qc = QuantumCircuit(3)
x = Parameter("x")
op = SparsePauliOp(["IXX", "XXI", "XXX"], coeffs=[1, 2, 3])
evo = PauliEvolutionGate(op, x)
qc.append(evo, qargs=qc.qregs[0])
print("PauliEvolutionGate")
print(qc)
print("PauliEvolutionGate + decompose")
print(qc.decompose())
pm = generate_preset_pass_manager(optimization_level=3, basis_gates=["cx", "rz", "h"])
qc = pm.run(qc)
print("PauliEvolutionGate + optimization_level=3")
print(qc)

# Hand-optimized
qc2 = QuantumCircuit(3)
qc2.rxx(2.0 * x, 0, 1)
# RXXX(6 * x, 0, 1, 2)
qc2.h([0, 1, 2])
qc2.cx(0, 1)
qc2.rzz(6.0 * x, 2, 1)
qc2.cx(0, 1)
qc2.h([0, 1, 2])
#
qc2.rxx(4.0 * x, 2, 1)
print("Hand-optimized")
print(qc2)

print("Hand-optimization + optimization_level=3")
qc2 = pm.run(qc2)
print(qc2)

for h in np.linspace(-1, 1, 10):
    op = Operator(qc.assign_parameters([h]))
    op2 = Operator(qc2.assign_parameters([h]))
    assert np.allclose(op.to_matrix(), op2.to_matrix())
print("passed")

Ah, thanks for checking Imamichi-san. Yeah, I guess I should have applied a couple of seconds' actual thought while typing all that... It's obvious when you draw it out that we can just rotate each individual single-qubit axis round to ZZZ, do the same trick, and rotate back. We ought to be able to handle the case that each qubit is being rotated on a different axis as well, right, e.g. if the cubic term is $XYZ$, the corresponding linear terms are $Z_0$, $Y_1$, $X_2$, and the quadratics are $Z_0 Y_1$ and $Z_0 X_2$ (or whatever the "central" qubit is). Then the required evolutions would be $H_2$ and $H_1 S_1$ (or whatever the exact $Y\to Z$ map is - I never remember off the top of my head).

This is still all for CX-based synthesis, but I think recognising these transformations means that we ought to be able to formulate the synthesis neatly for $R{zx}$-based entanglers as well (though I guess that was already a fairly obvious extension for $R{zz}$-based systems stemming from the paper).

Thank you for the detailed explanation. I got how we can handle cubic term $$XYZ$$. The generalization sounds much more attractive than my original request.

I hand-optimized 4-body terms in the similar way as follows. The number of two-qubit gates become half (16 -> 8).

     ┌────────────────────────────────────────────────────────┐
q_0: ┤0                                                       ├
     │                                                        │
q_1: ┤1                                                       ├
     │  exp(-it (ZZII + IZZI + IIZZ + IZZZ + ZZZI + ZZZZ))(x) │
q_2: ┤2                                                       ├
     │                                                        │
q_3: ┤3                                                       ├
     └────────────────────────────────────────────────────────┘

PauliEvolutionGate + optimization_level=3
                                                                               ┌───┐┌───────────┐┌───┐                             ┌───┐┌────────────┐┌───┐
q_0: ────────────────────────────────────────────────■─────────────────■───────┤ X ├┤ Rz(8.0*x) ├┤ X ├─────────────────────────────┤ X ├┤ Rz(12.0*x) ├┤ X ├──────────
                                                   ┌─┴─┐┌───────────┐┌─┴─┐┌───┐└─┬─┘└───────────┘└─┬─┘┌───┐     ┌───┐┌────────────┐└─┬─┘└────────────┘└─┬─┘┌───┐
q_1: ─────────────────────────■─────────────────■──┤ X ├┤ Rz(6.0*x) ├┤ X ├┤ X ├──■─────────────────■──┤ X ├─────┤ X ├┤ Rz(10.0*x) ├──■──────────────────■──┤ X ├─────
                            ┌─┴─┐┌───────────┐┌─┴─┐└───┘└───────────┘└───┘└─┬─┘                       └─┬─┘┌───┐└─┬─┘└────────────┘                        └─┬─┘┌───┐
q_2: ──■─────────────────■──┤ X ├┤ Rz(4.0*x) ├┤ X ├─────────────────────────■───────────────────────────■──┤ X ├──■──────────────────────────────────────────■──┤ X ├
     ┌─┴─┐┌───────────┐┌─┴─┐└───┘└───────────┘└───┘                                                        └─┬─┘                                                └─┬─┘
q_3: ┤ X ├┤ Rz(2.0*x) ├┤ X ├─────────────────────────────────────────────────────────────────────────────────■────────────────────────────────────────────────────■──
     └───┘└───────────┘└───┘
OrderedDict({'cx': 16, 'rz': 6})

Hand-optimization + optimization_level=3

q_0: ──────────────────■────────────────────────────────────────────────■───────────────────────────────────
     ┌───┐┌─────────┐┌─┴─┐┌─────────┐┌───┐┌─────────┐┌───┐┌──────────┐┌─┴─┐┌──────────┐┌───┐
q_1: ┤ X ├┤ Rz(4*x) ├┤ X ├┤ Rz(8*x) ├┤ X ├┤ Rz(6*x) ├┤ X ├┤ Rz(12*x) ├┤ X ├┤ Rz(10*x) ├┤ X ├────────────────
     └─┬─┘└─────────┘└───┘└─────────┘└─┬─┘└──┬───┬──┘└─┬─┘└──────────┘└───┘└──────────┘└─┬─┘┌─────────┐┌───┐
q_2: ──■───────────────────────────────■─────┤ X ├─────■─────────────────────────────────■──┤ Rz(2*x) ├┤ X ├
                                             └─┬─┘                                          └─────────┘└─┬─┘
q_3: ──────────────────────────────────────────■─────────────────────────────────────────────────────────■──

OrderedDict({'cx': 8, 'rz': 6})

script to reproduce

import numpy as np
from qiskit import QuantumCircuit, generate_preset_pass_manager
from qiskit.circuit import Parameter
from qiskit.circuit.library import PauliEvolutionGate
from qiskit.quantum_info import SparsePauliOp, Operator

# PauliEvolutionGate
qc = QuantumCircuit(4)
x = Parameter("x")
op = SparsePauliOp(["ZZII", "IZZI", "IIZZ", "IZZZ", "ZZZI", "ZZZZ"], coeffs=[1, 2, 3, 4, 5, 6])
evo = PauliEvolutionGate(op, x)
qc.append(evo, qargs=qc.qregs[0])
print("PauliEvolutionGate")
print(qc)
print("PauliEvolutionGate + decompose")
print(qc.decompose())
pm = generate_preset_pass_manager(optimization_level=3, basis_gates=["cx", "rz", "h"])
qc = pm.run(qc)
print("PauliEvolutionGate + optimization_level=3")
print(qc)
print(qc.count_ops())

# Hand-optimized
qc2 = QuantumCircuit(4)
# IZZI
qc2.rzz(4 * x, 2, 1)
# IZZZ
qc2.cx(0, 1)
qc2.rzz(8 * x, 2, 1)
qc2.cx(0, 1)
# IIZZ
qc2.rzz(6 * x, 0, 1)
# ZZZZ
qc2.cx(3, 2)
qc2.cx(2, 1)
qc2.rzz(12 * x, 0, 1)
qc2.cx(2, 1)
qc2.cx(3, 2)
# ZZZI
qc2.cx(3, 2)
qc2.rzz(10 * x, 2, 1)
qc2.cx(3, 2)
# ZZII
qc2.rzz(2 * x, 3, 2)
print("Hand-optimized")
print(qc2)

print("Hand-optimization + optimization_level=3")
qc2 = pm.run(qc2)
print(qc2)
print(qc2.count_ops())

for h in np.linspace(-1, 1, 10):
    op = Operator(qc.assign_parameters([h]))
    op2 = Operator(qc2.assign_parameters([h]))
    assert op.equiv(op2)
print("passed")

@itoko -san taught me that circuit synthesis of only CNOT and RZ is called phase polynomial and qiskit has a heuristic for all-to-all connectivity https://docs.quantum.ibm.com/api/qiskit/synthesis#synth_cnot_phase_aam.

Here is a small head-up: @Cryoris and I are actively working on integrating Simon Martiel's Rustiq synthesis code (see https://github.com/smartiel/rustiq) natively within Qiskit. This is a clever heuristic algorithm for all-to-all connectivity, and in particular exploits existing commutativity relations between Paulis: in both examples above all the Pauli terms commute and hence can be arbitrarily rearranged during synthesis (I am not sure if this is what makes the difference but it might).

Here are the results for the two examples above:

#op = SparsePauliOp(["IXX", "XXI", "XXX"], coeffs=[1, 2, 3])
Qiskit: count_2q = 6, depth_2q = 6, ops = OrderedDict([('cx', 4), ('rzz', 2), ('rz', 1)])
Rustiq: count_2q = 5, depth_2q = 5, ops = OrderedDict([('h', 6), ('cx', 5), ('rz', 3)])
Qiskit+Transpile: count_2q = 8, depth_2q = 8, ops = OrderedDict([('cx', 8), ('rz', 3)])
Rustiq+Transpile: count_2q = 5, depth_2q = 5, ops = OrderedDict([('cx', 5), ('rz', 3)])

#op = SparsePauliOp(["ZZII", "IZZI", "IIZZ", "IZZZ", "ZZZI", "ZZZZ"], coeffs=[1, 2, 3, 4, 5, 6])
Qiskit: count_2q = 17, depth_2q = 17, ops = OrderedDict([('cx', 14), ('rzz', 3), ('rz', 3)])
Rustiq: count_2q = 9, depth_2q = 6, ops = OrderedDict([('cx', 9), ('h', 8), ('rz', 6)])
Qiskit+Transpile: count_2q = 16, depth_2q = 16, ops = OrderedDict([('cx', 16), ('rz', 6)])
Rustiq+Transpile: count_2q = 9, depth_2q = 6, ops = OrderedDict([('cx', 9), ('rz', 6)])

The hand-crafted circuits are still slightly better, but only marginally so.

Thanks, @alexanderivrii. It's a great news. It would be nice if Rustiq can include the hand-optimized pattern too.

Qiskit / qiskit

Add an efficient circuit synthesis for a special pattern `exp(-it (IZZ + ZZI + ZZZ))` #13285

What should we add?