Qiskit / qiskit

Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.
https://www.ibm.com/quantum/qiskit
Apache License 2.0
5.28k stars 2.37k forks source link

Oxidize `UnitarySynthesis` pass #13141

Closed ElePT closed 1 month ago

ElePT commented 2 months ago

Summary

Closes #12210. This is an initial effort to port UnitarySynthesis to Rust, currently limited to the target + default plugin path (custom plugins will run in Python).

Details and comments

TODO:

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 11409692028

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/two_qubit_decompose.rs 14 15 93.33%
crates/accelerate/src/unitary_synthesis.rs 761 825 92.24%
<!-- Total: 784 849 92.34% -->
Files with Coverage Reduction New Missed Lines %
qiskit/circuit/library/standard_gates/u.py 1 93.0%
crates/qasm2/src/expr.rs 1 94.02%
crates/accelerate/src/elide_permutations.rs 1 97.06%
crates/accelerate/src/error_map.rs 1 85.42%
crates/accelerate/src/circuit_library/quantum_volume.rs 1 96.69%
qiskit/circuit/library/standard_gates/rz.py 2 95.83%
qiskit/circuit/library/quantum_volume.py 2 95.45%
crates/circuit/src/lib.rs 2 96.15%
qiskit/synthesis/stabilizer/stabilizer_decompose.py 2 96.77%
crates/circuit/src/bit_data.rs 2 94.62%
<!-- Total: 856 -->
Totals Coverage Status
Change from base Build 11241264267: -0.3%
Covered Lines: 73846
Relevant Lines: 83409

💛 - Coveralls
ElePT commented 2 months ago

Performance is starting to look promising:

Benchmarks that have improved:

| Change   | Before [2fa6dd65] <main>   | After [739a5c84] <oxidize-unitary-synthesis-benchmarking-3>   |   Ratio | Benchmark (Parameter)                                              |
|----------|----------------------------|---------------------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.30±0.04s                 | 1.17±0.02s                                                    |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')              |
| -        | 483±60ms                   | 375±2ms                                                       |    0.78 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |
| -        | 695±90ms                   | 476±20ms                                                      |    0.69 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 705±100ms                  | 481±10ms                                                      |    0.68 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
ElePT commented 2 months ago

Benchmarking status as of eb815d1 (more benchmarks show improvement but the improvement looks slightly smaller).

| Change   | Before [2fa6dd65] <unitary-test>   | After [266a22f0] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                              |
|----------|------------------------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.38±0.04s                         | 1.21±0.03s                                     |    0.87 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')               |
| -        | 425±8ms                            | 361±8ms                                        |    0.85 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |
| -        | 566±10ms                           | 467±10ms                                       |    0.83 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 3.94±0.2s                          | 3.04±0.02s                                     |    0.77 | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                |
| -        | 4.21±0.1s                          | 3.15±0.04s                                     |    0.75 | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                 |
| -        | 626±40ms                           | 456±4ms                                        |    0.73 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
qiskit-bot commented 2 months ago

One or more of the following people are relevant to this code:

ElePT commented 1 month ago

Benchmark for 4958425 compared with the most recent commit in main:

| Change   | Before [dcd41e9f]    | After [49584253] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                              |
|----------|----------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.27±0.1s            | 1.14±0.01s                                     |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')              |
| -        | 1.39±0.09s           | 1.21±0.02s                                     |    0.87 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')               |
| -        | 434±10ms             | 377±20ms                                       |    0.87 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |
| -        | 574±30ms             | 474±3ms                                        |    0.83 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 336±70ms             | 274±10ms                                       |    0.82 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')             |
| -        | 3.18±0.1s            | 2.53±0.09s                                     |    0.79 | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                 |
| -        | 603±20ms             | 454±5ms                                        |    0.75 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

I originally was running the benchmarks using an older commit as reference and I was noticing a hit in performance but it seems to be in tests that don't use this pass. Any theories on why this is the case?

| Change   | Before [2fa6dd65] <unitary-test>   | After [69eabc42] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                                        |
|----------|------------------------------------|------------------------------------------------|---------|------------------------------------------------------------------------------|
| +        | 393±100ms                          | 558±40ms                                       |    1.42 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                       |
| +        | 79.8±20ms                          | 114±2ms                                        |    1.42 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx') |
| +        | 412±90ms                           | 577±80ms                                       |    1.4  | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                      |
| +        | 24.0±4ms                           | 33.6±3ms                                       |    1.4  | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')             |
| -        | 1.06±0.08s                         | 928±40ms                                       |    0.88 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')            |
| -        | 1.06±0.2s                          | 897±40ms                                       |    0.84 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')           |
ElePT commented 1 month ago

The PR now completely avoids the return type enum and defines clearer paths for the dag and sequence outputs. This gets a bit ugly when evaluating the final synthesized circuits because I needed to collect the sequences on one side, dags on the other, and then do the final match depending on the errors. I am open to other proposals. I know the use of dag circuit methods is still not really consistent (I went for whatever looks shortest, for example, when thinking whether to use push_back or apply_operation_back or extend), and I still didn't use the new rust-based converters, but at least the performance is finally better so I re-requested a review.

ElePT commented 1 month ago

This is the latest benchmarking output after pulling from main, again a smaller number of tests improve but the improvement seems better.

| Change   | Before [90e92a46] <oxidize-unitary-synthesis^2>   | After [9cacf6fc] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                              |
|----------|---------------------------------------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| +        | 14.7±0.6ms                                        | 18.5±1ms                                       |    1.25 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100(‘cx’)    |
| -        | 512±100ms                                         | 363±9ms                                        |    0.71 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘cx’)  |
| -        | 841±100ms                                         | 444±30ms                                       |    0.53 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘cz’)  |
| -        | 858±70ms                                          | 446±30ms                                       |    0.52 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘ecr’) |
ElePT commented 1 month ago

I also ran the transpiler benchmarks for QV, which might give a better idea of the speedup than the utility scale ones, and got the following numbers on the tests that use the rust path (the rust path is only followed if transpile is called with a target, and some of these tests give coupling map/basis gates directly). I added a rust-path test for the 50x20 example to have more data to look at.

| Change   | Before [90e92a46] <oxidize-unitary-synthesis^2>   | After [9cacf6fc] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                                                       |
|----------|---------------------------------------------------|------------------------------------------------|---------|---------------------------------------------------------------------------------------------|
| -        | 138±0.8ms                                         | 121±2ms                                        |    0.88 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                    |
| -        | 691±20ms                                          | 600±30ms                                       |    0.87 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20_target(2) |
| -        | 100±1ms                                           | 87.4±4ms                                       |    0.87 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                    |
| -        | 62.4±0.8ms                                        | 52.8±4ms                                       |    0.85 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                    |
| -        | 321±3ms                                           | 262±10ms                                       |    0.82 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20_target(0) |
| -        | 62.0±1ms                                          | 44.9±0.5ms                                     |    0.72 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                    |
ShellyGarion commented 1 month ago

I think this is about ready in my opinion. We'll see about the panicking and errror handling on the synth_sequences based on the comments I just left. I'd appreciate if @ShellyGarion or @alexanderivrii can take a look at the arithmetic, or a second look from the team.

The general framework looks good to me. For a 1-qubit unitary we use euler_one_qubit_decomposer. For a 2-qubit unitary we use the TwoQubitBasisDecomposer (default for the basis gates 'cx', 'cz' or 'ecr') or the XXDecomposer. Later, we also plan to add the TwoQubitControlledUDecomposer (which should be default for gates like 'rzz' with an arbitrary angle), see #13139 and #13320. For a larger unitary, we use the qs_decomposition.

raynelfss commented 1 month ago

Currently reviewing... Any small changes I'll commit myself and merge