Closed ElePT closed 1 month ago
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
crates/accelerate/src/two_qubit_decompose.rs | 14 | 15 | 93.33% | ||
crates/accelerate/src/unitary_synthesis.rs | 761 | 825 | 92.24% | ||
<!-- | Total: | 784 | 849 | 92.34% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
qiskit/circuit/library/standard_gates/u.py | 1 | 93.0% | ||
crates/qasm2/src/expr.rs | 1 | 94.02% | ||
crates/accelerate/src/elide_permutations.rs | 1 | 97.06% | ||
crates/accelerate/src/error_map.rs | 1 | 85.42% | ||
crates/accelerate/src/circuit_library/quantum_volume.rs | 1 | 96.69% | ||
qiskit/circuit/library/standard_gates/rz.py | 2 | 95.83% | ||
qiskit/circuit/library/quantum_volume.py | 2 | 95.45% | ||
crates/circuit/src/lib.rs | 2 | 96.15% | ||
qiskit/synthesis/stabilizer/stabilizer_decompose.py | 2 | 96.77% | ||
crates/circuit/src/bit_data.rs | 2 | 94.62% | ||
<!-- | Total: | 856 | --> |
Totals | |
---|---|
Change from base Build 11241264267: | -0.3% |
Covered Lines: | 73846 |
Relevant Lines: | 83409 |
Performance is starting to look promising:
Benchmarks that have improved:
| Change | Before [2fa6dd65] <main> | After [739a5c84] <oxidize-unitary-synthesis-benchmarking-3> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|---------------------------------------------------------------|---------|--------------------------------------------------------------------|
| - | 1.30±0.04s | 1.17±0.02s | 0.9 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr') |
| - | 483±60ms | 375±2ms | 0.78 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx') |
| - | 695±90ms | 476±20ms | 0.69 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz') |
| - | 705±100ms | 481±10ms | 0.68 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
Benchmarking status as of eb815d1 (more benchmarks show improvement but the improvement looks slightly smaller).
| Change | Before [2fa6dd65] <unitary-test> | After [266a22f0] <oxidize-unitary-synthesis> | Ratio | Benchmark (Parameter) |
|----------|------------------------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| - | 1.38±0.04s | 1.21±0.03s | 0.87 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz') |
| - | 425±8ms | 361±8ms | 0.85 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx') |
| - | 566±10ms | 467±10ms | 0.83 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz') |
| - | 3.94±0.2s | 3.04±0.02s | 0.77 | utility_scale.UtilityScaleBenchmarks.time_qv('ecr') |
| - | 4.21±0.1s | 3.15±0.04s | 0.75 | utility_scale.UtilityScaleBenchmarks.time_qv('cz') |
| - | 626±40ms | 456±4ms | 0.73 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
One or more of the following people are relevant to this code:
@Qiskit/terra-core
@kevinhartman
@levbishop
@mtreinish
Benchmark for 4958425 compared with the most recent commit in main:
| Change | Before [dcd41e9f] | After [49584253] <oxidize-unitary-synthesis> | Ratio | Benchmark (Parameter) |
|----------|----------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| - | 1.27±0.1s | 1.14±0.01s | 0.9 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr') |
| - | 1.39±0.09s | 1.21±0.02s | 0.87 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz') |
| - | 434±10ms | 377±20ms | 0.87 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx') |
| - | 574±30ms | 474±3ms | 0.83 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz') |
| - | 336±70ms | 274±10ms | 0.82 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz') |
| - | 3.18±0.1s | 2.53±0.09s | 0.79 | utility_scale.UtilityScaleBenchmarks.time_qv('cx') |
| - | 603±20ms | 454±5ms | 0.75 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
I originally was running the benchmarks using an older commit as reference and I was noticing a hit in performance but it seems to be in tests that don't use this pass. Any theories on why this is the case?
| Change | Before [2fa6dd65] <unitary-test> | After [69eabc42] <oxidize-unitary-synthesis> | Ratio | Benchmark (Parameter) |
|----------|------------------------------------|------------------------------------------------|---------|------------------------------------------------------------------------------|
| + | 393±100ms | 558±40ms | 1.42 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx') |
| + | 79.8±20ms | 114±2ms | 1.42 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx') |
| + | 412±90ms | 577±80ms | 1.4 | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr') |
| + | 24.0±4ms | 33.6±3ms | 1.4 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr') |
| - | 1.06±0.08s | 928±40ms | 0.88 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz') |
| - | 1.06±0.2s | 897±40ms | 0.84 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr') |
The PR now completely avoids the return type enum and defines clearer paths for the dag and sequence outputs. This gets a bit ugly when evaluating the final synthesized circuits because I needed to collect the sequences on one side, dags on the other, and then do the final match depending on the errors. I am open to other proposals. I know the use of dag circuit methods is still not really consistent (I went for whatever looks shortest, for example, when thinking whether to use push_back
or apply_operation_back
or extend
), and I still didn't use the new rust-based converters, but at least the performance is finally better so I re-requested a review.
This is the latest benchmarking output after pulling from main, again a smaller number of tests improve but the improvement seems better.
| Change | Before [90e92a46] <oxidize-unitary-synthesis^2> | After [9cacf6fc] <oxidize-unitary-synthesis> | Ratio | Benchmark (Parameter) |
|----------|---------------------------------------------------|------------------------------------------------|---------|--------------------------------------------------------------------|
| + | 14.7±0.6ms | 18.5±1ms | 1.25 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100(‘cx’) |
| - | 512±100ms | 363±9ms | 0.71 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘cx’) |
| - | 841±100ms | 444±30ms | 0.53 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘cz’) |
| - | 858±70ms | 446±30ms | 0.52 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg(‘ecr’) |
I also ran the transpiler benchmarks for QV, which might give a better idea of the speedup than the utility scale ones, and got the following numbers on the tests that use the rust path (the rust path is only followed if transpile
is called with a target, and some of these tests give coupling map/basis gates directly). I added a rust-path test for the 50x20 example to have more data to look at.
| Change | Before [90e92a46] <oxidize-unitary-synthesis^2> | After [9cacf6fc] <oxidize-unitary-synthesis> | Ratio | Benchmark (Parameter) |
|----------|---------------------------------------------------|------------------------------------------------|---------|---------------------------------------------------------------------------------------------|
| - | 138±0.8ms | 121±2ms | 0.88 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3) |
| - | 691±20ms | 600±30ms | 0.87 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20_target(2) |
| - | 100±1ms | 87.4±4ms | 0.87 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2) |
| - | 62.4±0.8ms | 52.8±4ms | 0.85 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1) |
| - | 321±3ms | 262±10ms | 0.82 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20_target(0) |
| - | 62.0±1ms | 44.9±0.5ms | 0.72 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0) |
I think this is about ready in my opinion. We'll see about the panicking and errror handling on the
synth_sequences
based on the comments I just left. I'd appreciate if @ShellyGarion or @alexanderivrii can take a look at the arithmetic, or a second look from the team.
The general framework looks good to me.
For a 1-qubit unitary we use euler_one_qubit_decomposer
.
For a 2-qubit unitary we use the TwoQubitBasisDecomposer
(default for the basis gates 'cx', 'cz' or 'ecr') or the XXDecomposer
.
Later, we also plan to add the TwoQubitControlledUDecomposer
(which should be default for gates like 'rzz' with an arbitrary angle), see #13139 and #13320.
For a larger unitary, we use the qs_decomposition
.
Currently reviewing... Any small changes I'll commit myself and merge
Summary
Closes #12210. This is an initial effort to port UnitarySynthesis to Rust, currently limited to the target + default plugin path (custom plugins will run in Python).
Details and comments
TODO:
dag_to_circuit
(not reflected in unit tests)dag.push_back
and see if it can be done with the newdag.apply_operation_back
method