Avoid `ExtraInstructionAttributes` allocation on `unit="dt"`

jakelishman commented 2 months ago

Summary

The default value for Instruction.unit is "dt". Previously, the OperationFromPython extraction logic would only suppress allocation of the extra instruction attributes if all the contained fields were None, but None is not actually a valid value of Instruction.unit (which must be a string). This meant that OperationFromPython would always allocate and store extra attributes, even for the default cases. This did not affect standard gates appended using their corresponding QuantumCircuit methods (since no Python-space extraction is performed in that case), but did affect standard calls to append, or anything else that entered from Python space.

This drastically reduces the memory usage of circuits built by append-like methods. Ignoring the inefficiency factor of the heap-allocation implementation, this saves 66 bytes plus small-allocation overhead for 2-byte heap allocations (another 14 bytes on macOS, but will vary depending on the allocator) per standard instruction, which is on the order of 40% memory-usage reduction.

Details and comments

I'm using the same sort of microbenchmarking script I've been using since #12730, but now modified to use append instead of the special methods on QuantumCircuit:

from qiskit.circuit import QuantumCircuit, library

QUBITS = 1000
REPS = 3000

def main_methods():
    qc = QuantumCircuit(QUBITS)
    for _ in range(REPS):
        for q in qc.qubits:
            qc.rz(0.0, q)
        for q in qc.qubits:
            qc.rx(0.0, q)
        for q in qc.qubits:
            qc.rz(0.0, q)
        for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
            qc.cx(a, b)

def main_append():
    qc = QuantumCircuit(QUBITS)
    rz = library.RZGate(0.0)
    rx = library.RXGate(0.0)
    cx = library.CXGate()
    for _ in range(REPS):
        for q in qc.qubits:
            qc.append(rz, (q,))
        for q in qc.qubits:
            qc.append(rx, (q,))
        for q in qc.qubits:
            qc.append(rz, (q,))
        for qs in zip(qc.qubits[:-1], qc.qubits[1:]):
            qc.append(cx, qs)

The memory usage of main_append for the parent of this PR is approximately 2.3GB on both macOS, and Linux with glibc, whereas with the PR it drops to 1.35GB on macOS and 1.06GB on Linux/glibc. I suspect there's something additional going on in the macOS one, because while the Linux/glibc one drops to match the memory usage of main_methods (as expected), macOS remains 300MB higher. It might have been different Python versions - I used 3.10 on macOS and 3.12 on Linux.

qiskit-bot commented 2 months ago

One or more of the following people are relevant to this code:

@Qiskit/terra-core
@kevinhartman
@mtreinish

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 10680563760

Details

17 of 26 (65.38%) changed or added relevant lines in 2 files are covered.
31 unchanged lines in 6 files lost coverage.
Overall coverage decreased (-0.02%) to 89.133%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/circuit_instruction.rs	16	25	64.0%
<!--	Total:	17	26	65.38%	-->

Files with Coverage Reduction	New Missed Lines	%
crates/circuit/src/circuit_instruction.rs	1	87.41%
crates/qasm2/src/expr.rs	1	94.02%
crates/accelerate/src/two_qubit_decompose.rs	1	90.84%
crates/circuit/src/dag_circuit.rs	3	88.86%
crates/qasm2/src/lex.rs	7	91.48%
crates/qasm2/src/parse.rs	18	96.69%
<!--	Total:	31	-->

Totals
Change from base Build 10679848479:	-0.02%
Covered Lines:	71849
Relevant Lines:	80609

Qiskit / qiskit