Qiskit / rustworkx

A high performance Python graph library implemented in Rust.
https://www.rustworkx.org
Apache License 2.0
1.08k stars 147 forks source link

Port collect 2q blocks function from qiskit-terra #265

Closed mtreinish closed 3 years ago

mtreinish commented 3 years ago

What is the expected enhancement?

In qiskit-terra there is a transpiler pass collect2q blocks which primarily consists of a graph traversal. Doing the actual graph traversal and search will be much faster in retworkx, so we should add a function that can be used in place of terra's python version. The terra function code is:

https://github.com/Qiskit/qiskit-terra/blob/b058856e2aee9c5819120dc9c95f1e2478c95f4e/qiskit/transpiler/passes/optimization/collect_2q_blocks.py#L55-L246

This is similar to collect_runs which already has a native retworkx version:

https://github.com/Qiskit/retworkx/blob/b1f973630cb352950fbdeabd4b4c60191a62a6e7/src/lib.rs#L1121-L1173

and will operate in a similar manner with a filter function argument except it will collect 2q blocks instead of 1q runs.

nahumsa commented 3 years ago

I'll take a look on this issue.

IvanIsCoding commented 3 years ago

I'm working on it

IvanIsCoding commented 3 years ago

Giving an update on this after profiling the code and trying to convert it to Rust.

I am a bit concerned about porting all the Collect2qBlocks.run function to retworkx because it is very specific and it calls Python code very frequently. It won't generalize as collect_runs, because it has multiple conditions for handling cases and most of them mix checking DAG properties and quantum properties. So porting it might make it much harder to maintain the transpiler pass in qiskit-terra and the function in retworkx.

Nevertheless, I still think we can improve the performance by porting some core parts to Rust. Mainly about porting quantum_successors, quantum_predecessors, and some Python code that checks for elements in successors and predecessors.

I profiled the code with:

from random import randint
from qiskit import QuantumRegister, QuantumCircuit
from qiskit.dagcircuit import DAGCircuit
from qiskit.converters import circuit_to_dag
from qiskit.transpiler.passes.optimization import Collect2qBlocks

q = QuantumRegister(1000, 'q')
qc = QuantumCircuit(q)
for i in range(100000):
    a = randint(0, 999)
    b = randint(0, 999)
    if a == b:
        qc.h(a)
    else:
        if i % 2 == 0:
            qc.cz(a, b)
        else:
            qc.cx(a, b)

dag = circuit_to_dag(qc)

c2q = Collect2qBlocks()

for i in range(100):
    c2q.run(dag)

The results showed we spend around 10% of the computing time in quantum_predecessors, 7% of the time in quantum_successors, 7% in successors and 6% in predecessors.

I think the culprits are this and this check in predecessors/successors.

To address them, instead of rewriting the whole function I think we can:

I will make a Draft PR with the additions and rewrite the function in qiskit-terra using those new functions. Hopefully that helps boost speed and make the code remain maintainable.