Oxidize `UnitarySynthesis` pass

ElePT commented 2 months ago

Summary

Closes #12210. This is an initial effort to port UnitarySynthesis to Rust, currently limited to the target + default plugin path (custom plugins will run in Python).

Details and comments

TODO:

[x] Add unit tests for qsd path and 1q synthesis path
[ ] Address drawer issue after calling dag_to_circuit (not reflected in unit tests)
[x] Clean up comments
[x] Review use of dag.push_back and see if it can be done with the new dag.apply_operation_back method

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 11409692028

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

784 of 849 (92.34%) changed or added relevant lines in 5 files are covered.
856 unchanged lines in 37 files lost coverage.
Overall coverage decreased (-0.3%) to 88.535%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/two_qubit_decompose.rs	14	15	93.33%
crates/accelerate/src/unitary_synthesis.rs	761	825	92.24%
<!--	Total:	784	849	92.34%	-->

Files with Coverage Reduction	New Missed Lines	%
qiskit/circuit/library/standard_gates/u.py	1	93.0%
crates/qasm2/src/expr.rs	1	94.02%
crates/accelerate/src/elide_permutations.rs	1	97.06%
crates/accelerate/src/error_map.rs	1	85.42%
crates/accelerate/src/circuit_library/quantum_volume.rs	1	96.69%
qiskit/circuit/library/standard_gates/rz.py	2	95.83%
qiskit/circuit/library/quantum_volume.py	2	95.45%
crates/circuit/src/lib.rs	2	96.15%
qiskit/synthesis/stabilizer/stabilizer_decompose.py	2	96.77%
crates/circuit/src/bit_data.rs	2	94.62%
<!--	Total:	856	-->

Totals
Change from base Build 11241264267:	-0.3%
Covered Lines:	73846
Relevant Lines:	83409

💛 - Coveralls

ElePT commented 2 months ago

Performance is starting to look promising:

Benchmarks that have improved:

| Change   | Before [2fa6dd65] <main>   | After [739a5c84] <oxidize-unitary-synthesis-benchmarking-3>   |   Ratio | Benchmark (Parameter)                                              |
|----------|----------------------------|---------------------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.30±0.04s                 | 1.17±0.02s                                                    |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')              |
| -        | 483±60ms                   | 375±2ms                                                       |    0.78 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |
| -        | 695±90ms                   | 476±20ms                                                      |    0.69 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 705±100ms                  | 481±10ms                                                      |    0.68 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

ElePT commented 2 months ago

Benchmarking status as of eb815d1 (more benchmarks show improvement but the improvement looks slightly smaller).

| Change   | Before [2fa6dd65] <unitary-test>   | After [266a22f0] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                              |
|----------|------------------------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.38±0.04s                         | 1.21±0.03s                                     |    0.87 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')               |
| -        | 425±8ms                            | 361±8ms                                        |    0.85 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |
| -        | 566±10ms                           | 467±10ms                                       |    0.83 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 3.94±0.2s                          | 3.04±0.02s                                     |    0.77 | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                |
| -        | 4.21±0.1s                          | 3.15±0.04s                                     |    0.75 | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                 |
| -        | 626±40ms                           | 456±4ms                                        |    0.73 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |

qiskit-bot commented 2 months ago

One or more of the following people are relevant to this code:

@Qiskit/terra-core
@kevinhartman
@levbishop
@mtreinish

ElePT commented 1 month ago

Benchmark for 4958425 compared with the most recent commit in main:

| Change   | Before [dcd41e9f]    | After [49584253] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                              |
|----------|----------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| -        | 1.27±0.1s            | 1.14±0.01s                                     |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')              |
| -        | 1.39±0.09s           | 1.21±0.02s                                     |    0.87 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')               |
| -        | 434±10ms             | 377±20ms                                       |    0.87 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')  |
| -        | 574±30ms             | 474±3ms                                        |    0.83 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')  |
| -        | 336±70ms             | 274±10ms                                       |    0.82 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')             |
| -        | 3.18±0.1s            | 2.53±0.09s                                     |    0.79 | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                 |
| -        | 603±20ms             | 454±5ms                                        |    0.75 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

I originally was running the benchmarks using an older commit as reference and I was noticing a hit in performance but it seems to be in tests that don't use this pass. Any theories on why this is the case?

| Change   | Before [2fa6dd65] <unitary-test>   | After [69eabc42] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                                        |
|----------|------------------------------------|------------------------------------------------|---------|------------------------------------------------------------------------------|
| +        | 393±100ms                          | 558±40ms                                       |    1.42 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                       |
| +        | 79.8±20ms                          | 114±2ms                                        |    1.42 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx') |
| +        | 412±90ms                           | 577±80ms                                       |    1.4  | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                      |
| +        | 24.0±4ms                           | 33.6±3ms                                       |    1.4  | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')             |
| -        | 1.06±0.08s                         | 928±40ms                                       |    0.88 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')            |
| -        | 1.06±0.2s                          | 897±40ms                                       |    0.84 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')           |

ElePT commented 1 month ago

The PR now completely avoids the return type enum and defines clearer paths for the dag and sequence outputs. This gets a bit ugly when evaluating the final synthesized circuits because I needed to collect the sequences on one side, dags on the other, and then do the final match depending on the errors. I am open to other proposals. I know the use of dag circuit methods is still not really consistent (I went for whatever looks shortest, for example, when thinking whether to use push_back or apply_operation_back or extend), and I still didn't use the new rust-based converters, but at least the performance is finally better so I re-requested a review.

ElePT commented 1 month ago

This is the latest benchmarking output after pulling from main, again a smaller number of tests improve but the improvement seems better.

| Change   | Before [90e92a46] <oxidize-unitary-synthesis^2>   | After [9cacf6fc] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                              |
|----------|---------------------------------------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| +        | 14.7±0.6ms                                        | 18.5±1ms                                       |    1.25 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100(‘cx’)    |
| -        | 512±100ms                                         | 363±9ms                                        |    0.71 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘cx’)  |
| -        | 841±100ms                                         | 444±30ms                                       |    0.53 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘cz’)  |
| -        | 858±70ms                                          | 446±30ms                                       |    0.52 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘ecr’) |

ElePT commented 1 month ago

I also ran the transpiler benchmarks for QV, which might give a better idea of the speedup than the utility scale ones, and got the following numbers on the tests that use the rust path (the rust path is only followed if transpile is called with a target, and some of these tests give coupling map/basis gates directly). I added a rust-path test for the 50x20 example to have more data to look at.

| Change   | Before [90e92a46] <oxidize-unitary-synthesis^2>   | After [9cacf6fc] <oxidize-unitary-synthesis>   |   Ratio | Benchmark (Parameter)                                                                       |
|----------|---------------------------------------------------|------------------------------------------------|---------|---------------------------------------------------------------------------------------------|
| -        | 138±0.8ms                                         | 121±2ms                                        |    0.88 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                    |
| -        | 691±20ms                                          | 600±30ms                                       |    0.87 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20_target(2) |
| -        | 100±1ms                                           | 87.4±4ms                                       |    0.87 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                    |
| -        | 62.4±0.8ms                                        | 52.8±4ms                                       |    0.85 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                    |
| -        | 321±3ms                                           | 262±10ms                                       |    0.82 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20_target(0) |
| -        | 62.0±1ms                                          | 44.9±0.5ms                                     |    0.72 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                    |

ShellyGarion commented 1 month ago

I think this is about ready in my opinion. We'll see about the panicking and errror handling on the synth_sequences based on the comments I just left. I'd appreciate if @ShellyGarion or @alexanderivrii can take a look at the arithmetic, or a second look from the team.

The general framework looks good to me. For a 1-qubit unitary we use euler_one_qubit_decomposer. For a 2-qubit unitary we use the TwoQubitBasisDecomposer (default for the basis gates 'cx', 'cz' or 'ecr') or the XXDecomposer. Later, we also plan to add the TwoQubitControlledUDecomposer (which should be default for gates like 'rzz' with an arbitrary angle), see #13139 and #13320. For a larger unitary, we use the qs_decomposition.

raynelfss commented 1 month ago

Currently reviewing... Any small changes I'll commit myself and merge

Qiskit / qiskit