OptimizeSwapBeforeMeasure pass drops Swap gate (even if there is NO measure after it)

Environment

Qiskit Terra version: 0.19.1
Python version: 3.8
Operating system: Ubuntu 18.04.6 LTS

What is happening?

I define a a sub-circuit without any measurement (since it is intended as sub-component); this sub-circuit ends with a swap gate, then I optimize the sub-circuit (optimization level=3) before using it as building block for a larger circuit/algorithm. But this optimization drops my final swap gate and leaves me with a semantically different sub-circuit with respect to the initial one I wrote. Thus the optimization is modifying the semantic in this context.

How can we reproduce the issue?

I transpile with level 0 and I get the expected result:

from qiskit import QuantumCircuit, transpile, Aer, execute
from qiskit.circuit.library.standard_gates import *
qc = QuantumCircuit(3)
qc.x(0)
qc.swap(0,1)
qc = transpile(qc, optimization_level=0)
qc.measure_all()
qc.draw(fold=-1)
counts = execute(qc, backend=Aer.get_backend('qasm_simulator'), shots=1024).result().get_counts(qc)
        ┌───┐    ░ ┌─┐      
   q_0: ┤ X ├─X──░─┤M├──────
        └───┘ │  ░ └╥┘┌─┐   
   q_1: ──────X──░──╫─┤M├───
                 ░  ║ └╥┘┌─┐
   q_2: ─────────░──╫──╫─┤M├
                 ░  ║  ║ └╥┘
meas: 3/════════════╩══╩══╩═
                    0  1  2 
{'010': 1024}

Running it with level 3 instead, drops my swap and gives a semantically different circuit, with different result.

from qiskit import QuantumCircuit, transpile, Aer, execute
from qiskit.circuit.library.standard_gates import *
qc = QuantumCircuit(3)
qc.x(0)
qc.swap(0,1)
import pdb
pdb.set_trace()
qc = transpile(qc, optimization_level=3)
qc.measure_all()
qc.draw(fold=-1)
counts = execute(qc, backend=Aer.get_backend('qasm_simulator'), shots=1024).result().get_counts(qc)
        ┌───┐ ░ ┌─┐      
   q_0: ┤ X ├─░─┤M├──────
        └───┘ ░ └╥┘┌─┐   
   q_1: ──────░──╫─┤M├───
              ░  ║ └╥┘┌─┐
   q_2: ──────░──╫──╫─┤M├
              ░  ║  ║ └╥┘
meas: 3/═════════╩══╩══╩═
                 0  1  2 
{'001': 1024}

What should happen?

I would have expected the optimization to preserve the semantic of the input circuit.

Any suggestions?

Via interactive debugging I nailed down the problem to the optimization pass OptimizeSwapBeforeMeasure. This removes the swap before measurement and the swaps which are before final nodes DAGOutNode (aka end of the circuit). But in this way we are assuming that there will be never be any measurement, which in case of sub-component this is not true. Thus I suggest to remove this optimization for the swap followed by DAGOutNode nodes, because we cannot guarantee a perfectly semantically equivalent circuit.

https://github.com/Qiskit/qiskit-terra/blob/fcec842f1de9fd12120e30a1bf73bf7c52b1bf81/qiskit/transpiler/passes/optimization/optimize_swap_before_measure.py#L45

        swaps = dag.op_nodes(SwapGate)
        for swap in swaps[::-1]:
            if swap.op.condition is not None:
                continue
            final_successor = []
            for successor in dag.successors(swap):
                final_successor.append(
                    isinstance(successor, DAGOutNode)
                    or (isinstance(successor, DAGOpNode) and isinstance(successor.op, Measure))
                )
            if all(final_successor):
                # the node swap needs to be removed and, if a measure follows, needs to be adapted
                swap_qargs = swap.qargs
                measure_layer = DAGCircuit()
                for qreg in dag.qregs.values():
                    measure_layer.add_qreg(qreg)
                for creg in dag.cregs.values():
                    measure_layer.add_creg(creg)
                for successor in list(dag.successors(swap)):
                    if isinstance(successor, DAGOpNode) and isinstance(successor.op, Measure):
                        # replace measure node with a new one, where qargs is set with the "other"
                        # swap qarg.
                       ...
                dag.compose(measure_layer)
                dag.remove_op_node(swap)

Going back to my bug-triggering code, if the measurement would have been included in the circuit before the transpile call it would have worked perfectly, but sometimes we might not know how this sub-circuit will be reused in other parts of our algorithm.

Qiskit / qiskit