anedumla commented 2 years ago

Environment

Qiskit Terra version: 0.19.0
Python version: 3.9.4
Operating system: Windows 10 Enterprise 64-bit

What is happening?

Using optimization_level=3 within the transpile method yields a circuit with different outcomes that the one produced either from setting optimization_level=2 or evaluating the non transpiled circuit with Statevector.

The wrong behaviour was observed while increasing the number of trotter steps in a Trotterization time evolution, and the circuits obtained from optimization_level=3 perform as expected as long as the original circuit remains relatively shallow (see plots below).

How can we reproduce the issue?

import numpy as np
from qiskit import QuantumCircuit, transpile, IBMQ, execute, Aer
from qiskit.quantum_info import Pauli, Operator, Statevector
from qiskit.opflow import PauliOp
from qiskit.circuit.library import PauliEvolutionGate
from qiskit.synthesis import SuzukiTrotter
import matplotlib.pyplot as plt
from scipy.linalg import expm

# input parameters
provider = IBMQ.load_account()
provider = IBMQ.get_provider(hub='ibm-q-internal', group='deployed')
backend = provider.get_backend('ibmq_mumbai')
simulator = Aer.get_backend('aer_simulator')
shots = 100000
layout =  [0,1,4]
num_qubits = len(layout)
evo_time = 1

def heisenberg_EO(n_qubits):
    """Construct Hamiltonian of 1-D Heisenberg chain with n_qubits sites.
    Return a list of matrices corresponding to the even-odd-splitting (154) in
    Childs2021.
    """
    # Three qubits because for 2 we get H_O = 0
    assert n_qubits >= 3
    def OPauli(s):
        return PauliOp(Pauli(s))
    def two_qubit_interaction(i, coeff):
        assert i < n_qubits - 1
        return (OPauli("I" * i + "XX" + "I" * (n_qubits - 2 - i)) +
                OPauli("I" * i + "YY" + "I" * (n_qubits - 2 - i)) +
                OPauli("I" * i + "ZZ" + "I" * (n_qubits - 2 - i)) +
                OPauli("I" * i + "Z" + "I" * (n_qubits - 1 - i)) *
                coeff)

    H_E = sum((two_qubit_interaction(i, 1)
               for i in range(0, n_qubits - 1, 2)))
    H_O = sum((two_qubit_interaction(i, 1)
               for i in range(1, n_qubits - 1, 2)))
    return [H_E, H_O]

def get_circuit(qubits, evo_time, trotter_steps):
    num_qubits = qubits

    # Obtain the matrices for Trotter 
    paulis = heisenberg_EO(num_qubits)

    evo_op = PauliEvolutionGate(paulis, evo_time, synthesis=SuzukiTrotter(order=2, reps=trotter_steps))
    return evo_op.definition

# Calculate the local magnetization observable of a qubit
def local_magnetization(qubit, counts, shots):
    zeros, ones = 0, 0
    for c, v in counts.items():
        if c[qubit] == '0':
            zeros += v
        else:
            ones += v
    return (zeros - ones) / shots

# Calculate the local magnetization observable from a statevector
def local_magnetization_vec(qubit, statevector, num_qubits):
    'calculate the prob of seing 0 and 1 in the given qubit'
    zero, one = 0, 0
    for i,amplitude in enumerate(statevector):
        prob = np.real(amplitude * np.conj(amplitude))
        if bin(i)[2:].zfill(num_qubits)[qubit] == '0':
            zero += prob
        else:
            one += prob
    return zero - one

def get_plot(ax, local_qubit):
    # arrays for the magnetization value for different number of Trotter steps
    magn_opt2, magn_opt3, magn_sv = [], [], []

    for i in range(len(trotter_steps)):
        magn_opt2.append(local_magnetization(local_qubit, counts_opt2[i], shots))
        magn_opt3.append(local_magnetization(local_qubit, counts_opt3[i], shots)) 
        magn_sv.append(local_magnetization_vec(local_qubit, statevectors[i], num_qubits))
    magn_exact = local_magnetization_vec(local_qubit, sol_vec, num_qubits)
    ax.plot(trotter_steps, magn_opt2, label='opt_lvl=2')
    ax.plot(trotter_steps, magn_opt3, label='opt_lvl=3')
    ax.plot(trotter_steps, magn_sv, label='statevector')
    ax.plot(trotter_steps, [magn_exact]*len(trotter_steps),'--', label='exact')
    ax.set(xlabel='trotter steps', ylabel=r'<$\sigma_{%s}^z$>'%(local_qubit))
    ax.legend()

# Aer simulator circuits
trotter_steps = range(1, 31)
jobs_sim_opt2, jobs_sim_opt3 = [], []

for m in trotter_steps:
    qc = QuantumCircuit(num_qubits, num_qubits)
    qc.h(list(range(num_qubits))) # dummy initial state
    qc.append(get_circuit(num_qubits, evo_time, m), list(range(num_qubits)))
    qc.measure(range(num_qubits), range(num_qubits))
    # transpiled circuit with optimization level 2 and 3
    qct2 = transpile(qc, backend, initial_layout=layout, optimization_level=2)
    qct3 = transpile(qc, backend, initial_layout=layout, optimization_level=3)
    # simulated job
    jobs_sim_opt2.append(execute(qct2, simulator, shots=shots))
    jobs_sim_opt3.append(execute(qct3, simulator, shots=shots))

# Get Aer simulator results
counts_opt2, counts_opt3 = [], []

for i in range(len(trotter_steps)):
    counts_opt2.append(jobs_sim_opt2[i].result().get_counts())
    counts_opt3.append(jobs_sim_opt3[i].result().get_counts())

# Statevector
statevectors = []
for m in trotter_steps:
    qc = QuantumCircuit(num_qubits)
    qc.h(list(range(num_qubits))) # dummy initial state
    qc.append(get_circuit(num_qubits, evo_time, m), list(range(num_qubits)))
    statevectors.append(Statevector(qc).data)

# Exact vector
H1, H2 = heisenberg_EO(num_qubits)
exactH = H1.to_matrix(massive=True) + H2.to_matrix(massive=True)
ham_H = expm(1j * exactH * evo_time)
sol_vec = ham_H @ np.array([1]*(2**num_qubits))*(1/np.sqrt(2)**num_qubits)

# Plot local measurements
fig, axes = plt.subplots(1, num_qubits, figsize=(17,5))
fig.suptitle(r'$<\sigma^z>$ local measurement')
for i in range(num_qubits):
    get_plot(axes[i], i)

Output: issue_qiskit

What should happen?

In the output plots the line for optimization_level=3 should coincide with the statevector and optimization_level=2 lines.

Any suggestions?

No response

nonhermitian commented 2 years ago

Yes. Optimization level 3 gives (for 15 steps):

global phase: 2π

        q_0 -> 0 ─────────
                 ┌───────┐
        q_1 -> 1 ┤ Rz(2) ├
                 ├───────┤
        q_2 -> 2 ┤ Rz(2) ├
                 └───────┘
  ancilla_0 -> 3 ─────────

  ancilla_1 -> 4 ─────────

  ancilla_2 -> 5 ─────────

where as 2 gives something much longer, a small snippet of which is:

global phase: π/2
                  ┌─────────┐  ┌────┐┌─────────┐                               »
        q_0 -> 0 ─┤ Rz(π/2) ├──┤ √X ├┤ Rz(π/2) ├──■────────────────────────────»
                  ├─────────┤  ├────┤├─────────┤┌─┴─┐┌────────────────────────┐»
        q_1 -> 1 ─┤ Rz(π/2) ├──┤ √X ├┤ Rz(π/2) ├┤ X ├┤ Rz(0.0666666666666667) ├»
                 ┌┴─────────┴─┐├────┤├─────────┤└───┘└────────────────────────┘»
        q_2 -> 2 ┤ Rz(1.6375) ├┤ √X ├┤ Rz(π/2) ├───────────────────────────────»
                 └────────────┘└────┘└─────────┘                               »

nonhermitian commented 2 years ago

It is independent of the routing method, so something else is breaking.

kdk commented 2 years ago

Do we know yet if this is due to changes in optimization_level=3 after 0.18.3? I wasn't able to directly replicate the example on 0.18.3 because it use e.g. PauliEvolutionGate which is new with 0.19.

nonhermitian commented 2 years ago

Here is the raw circuit used above:

circ.qpy.zip

levbishop commented 2 years ago

Collect2qBlocks/ConsolidateBlocks/UnitarySynthesis is greatly simplifying these circuits here.


def cb(pass_, dag, **kwargs):
    print(type(pass_).__name__, dag.count_ops())

qct3 = transpile(qc, backend, initial_layout=layout, optimization_level=3, callback=cb)

Going from 268 cx to 31 unitary to 0 entangling gates....

UnitarySynthesis {'h': 3, 'circuit-2402': 1, 'measure': 3}
Unroll3qOrMore {'h': 3, 'measure': 3, 'rzz': 45, 'rxx': 45, 'rz': 45, 'ryy': 45}
RemoveResetInZeroState {'h': 3, 'measure': 3, 'rzz': 45, 'rxx': 45, 'rz': 45, 'ryy': 45}
OptimizeSwapBeforeMeasure {'h': 3, 'measure': 3, 'rzz': 45, 'rxx': 45, 'rz': 45, 'ryy': 45}
RemoveDiagonalGatesBeforeMeasure {'h': 3, 'measure': 3, 'rzz': 44, 'rxx': 45, 'rz': 44, 'ryy': 45}
SetLayout {'h': 3, 'measure': 3, 'rzz': 44, 'rxx': 45, 'rz': 44, 'ryy': 45}
FullAncillaAllocation {'h': 3, 'measure': 3, 'rzz': 44, 'rxx': 45, 'rz': 44, 'ryy': 45}
EnlargeWithAncilla {'h': 3, 'measure': 3, 'rzz': 44, 'rxx': 45, 'rz': 44, 'ryy': 45}
ApplyLayout {'h': 3, 'rxx': 45, 'ryy': 45, 'rzz': 44, 'rz': 44, 'measure': 3}
CheckMap {'h': 3, 'rxx': 45, 'ryy': 45, 'rzz': 44, 'rz': 44, 'measure': 3}
UnitarySynthesis {'h': 3, 'rxx': 45, 'ryy': 45, 'rzz': 44, 'rz': 44, 'measure': 3}
UnrollCustomDefinitions {'h': 3, 'rxx': 45, 'ryy': 45, 'rzz': 44, 'rz': 44, 'measure': 3}
BasisTranslator {'rz': 1084, 'measure': 3, 'sx': 543, 'cx': 268}
RemoveResetInZeroState {'rz': 1084, 'measure': 3, 'sx': 543, 'cx': 268}
Depth {'rz': 1084, 'measure': 3, 'sx': 543, 'cx': 268}
FixedPoint {'rz': 1084, 'measure': 3, 'sx': 543, 'cx': 268}
vvvv
Collect2qBlocks {'rz': 1084, 'measure': 3, 'sx': 543, 'cx': 268}
ConsolidateBlocks {'measure': 3, 'unitary': 31}
UnitarySynthesis {'measure': 3, 'rz': 93, 'sx': 61}
^^^^
Optimize1qGatesDecomposition {'measure': 3, 'rz': 6, 'sx': 3}
CommutationAnalysis {'measure': 3, 'rz': 6, 'sx': 3}
CommutativeCancellation {'measure': 3, 'rz': 6, 'sx': 3}
UnitarySynthesis {'measure': 3, 'rz': 6, 'sx': 3}
UnrollCustomDefinitions {'measure': 3, 'rz': 6, 'sx': 3}
BasisTranslator {'measure': 3, 'rz': 6, 'sx': 3}
Depth {'measure': 3, 'rz': 6, 'sx': 3}
FixedPoint {'measure': 3, 'rz': 6, 'sx': 3}
Collect2qBlocks {'measure': 3, 'rz': 6, 'sx': 3}
ConsolidateBlocks {'measure': 3, 'rz': 6, 'sx': 3}
UnitarySynthesis {'measure': 3, 'rz': 6, 'sx': 3}
Optimize1qGatesDecomposition {'measure': 3, 'rz': 6, 'sx': 3}
CommutationAnalysis {'measure': 3, 'rz': 6, 'sx': 3}
CommutativeCancellation {'measure': 3, 'rz': 6, 'sx': 3}
UnitarySynthesis {'measure': 3, 'rz': 6, 'sx': 3}
UnrollCustomDefinitions {'measure': 3, 'rz': 6, 'sx': 3}
BasisTranslator {'measure': 3, 'rz': 6, 'sx': 3}
Depth {'measure': 3, 'rz': 6, 'sx': 3}
FixedPoint {'measure': 3, 'rz': 6, 'sx': 3}
Collect2qBlocks {'measure': 3, 'rz': 6, 'sx': 3}
ConsolidateBlocks {'measure': 3, 'rz': 6, 'sx': 3}
UnitarySynthesis {'measure': 3, 'rz': 6, 'sx': 3}
Optimize1qGatesDecomposition {'measure': 3, 'rz': 6, 'sx': 3}
CommutationAnalysis {'measure': 3, 'rz': 6, 'sx': 3}
CommutativeCancellation {'measure': 3, 'rz': 6, 'sx': 3}
UnitarySynthesis {'measure': 3, 'rz': 6, 'sx': 3}
UnrollCustomDefinitions {'measure': 3, 'rz': 6, 'sx': 3}
BasisTranslator {'measure': 3, 'rz': 6, 'sx': 3}
ContainsInstruction {'measure': 3, 'rz': 6, 'sx': 3}

levbishop commented 2 years ago

I think the problem here is the basic interaction of small-step Trotterization (which makes the interaction-per-step become small) with approximate unitary synthesis (which avoids emitting 2q gates when the interaction becomes small enough). The solution is to use fewer Trotter steps or to turn off approximate synthesis.

nonhermitian commented 2 years ago

It appears that this happens only with the default approximation_degree=None. Setting 1 (min approx) or 0 (max approx) both yield the same circuit.

kdk commented 2 years ago

7348 takes the approach of disabling approximate synthesis entirely ~(since this is more or less a rollback to the behavior of 0.18.3)~ (I was incorrect, this behavior had already been released as part of 0.18.3).

Thinking about this a bit more though, could it be the case that this is a sound optimization to make (in the interest of best approximating the given unitary with the available device gate fidelities), even though when simulating (on an ideal simulator) you'll end up quite far from the intended operation?

levbishop commented 2 years ago

Thinking about this a bit more though, could it be the case that this is a sound optimization to make (in the interest of best approximating the given unitary with the available device gate fidelities), even though when simulating (on an ideal simulator) you'll end up quite far from the intended operation?

This is the intention, that you do the best you can given available gate fidelities. It may end up far from the intended, but better than not approximating. Of course the decision is heuristic and local, so may not always succeed - something like trotterization where the repeated trotter steps are intended to add coherently would be an example where the heuristic based on assuming a depolarizing channel of equivalent infidelity might choose poorly.

kdk commented 2 years ago

In that case, I'd lean toward the transpiler behavior being not a bug. It may be a bug though that users can compile based on a set of backend_properties and simulate without them without seeing a warning. Does that make sense to you @anedumla @nonhermitian @levbishop ?

nonhermitian commented 2 years ago

As an user I am actually surprised that this was enabled by default. To me at least, it seems to be implicitly breaking the idea that the transpiler faithfully performs the unitary in question (up to certain permutations, transformations etc.). To me this feels akin to the fast-math compiler option that breaks IEEE 754 floating-point rules to gain performance, but occasionally really messes things up, eg see https://github.com/JuliaLang/julia/issues/30073#issuecomment-439707503

I think approximations like this should be explicit, and turned on by the user with knowledge that bad things can some times happen.

anedumla commented 2 years ago

I agree with @nonhermitian. Personally, I would not expect that the circuit returned is an approximation and would prefer to explicitly activate an option to approximate the unitary given the hardware characteristics.

mtreinish commented 1 year ago

Since #8595 has merged I think this has been implemented so I'm going to close this. If I'm missing something here though or misinterpreting what was needed for this issue please feel free to reopen this.

Qiskit / qiskit

Optimization level 3 in transpile gives the wrong circuit when increasing the depth #7341

Environment

What is happening?

How can we reproduce the issue?

What should happen?

Any suggestions?

7348 takes the approach of disabling approximate synthesis entirely ~(since this is more or less a rollback to the behavior of 0.18.3)~ (I was incorrect, this behavior had already been released as part of 0.18.3).