Improving the general ``Gate.control`` mechanism

Cryoris commented 3 years ago

What is the expected enhancement?

Improve the way we implement controls of general gates, e.g. by

checking whether the gate instance we try to control implements an efficient way to control (rather than hardcoding a list of which gates we know how to control in add_control.py)
improving the generic control mechanism for an arbitrary unitary that doesn't implement an efficient control method

Example

For a circuit that contains a gate that's in the hardcoded gate list in add_control (like 'x'):

>>> from qiskit import QuantumCircuit
>>> qc = QuantumCircuit(1)
>>> qc.x(0)
>>> qc.control().decompose().draw()

q1_0: ──■──
      ┌─┴─┐
 q_0: ┤ X ├
      └───┘

but for one that isn't (like 'sx'):

>>> qc = QuantumCircuit(1)
>>> qc.sx(0)
<qiskit.circuit.instructionset.InstructionSet object at 0x7f9f88081a50>
>>> qc.control().decompose().draw()
                                          ┌────────┐
q2_0: ─■───────────────■─────────■────────┤ P(π/4) ├
       │P(-π/2) ┌──────┴───────┐ │P(-π/2) └────────┘
 q_0: ─■────────┤ U(π/2,0,π,0) ├─■──────────────────
                └──────────────┘

even though the SXGate defines an efficient control method:

>>> from qiskit.circuit.library import SXGate
>>> SXGate().control().definition.draw()

q_0: ──────■─────────────
     ┌───┐ │U1(π/2) ┌───┐
q_1: ┤ H ├─■────────┤ H ├
     └───┘          └───┘

jayanth260 commented 2 months ago

I am not sure following conern comes under this issue.

Converting an arbitrary gate (generated by UnitaryGate()) to its controlled version is computationally expensive, primarily due to operations on the isometry.

Instead of performing all operations on the isometry, we can use UnitaryGate(_compute_control_matrix()) for the controlled version of the given base unitary. This approach maintains the appearance of a controlled gate while significantly reducing computation time.

Create a unitary matrix U' for the controlled version of unitary matrix U: U' = |control⟩⟨control| ⊗ U + (I - |control⟩⟨control|) ⊗ I
Use this U' in a UnitaryGate() constructor.
Set appropriate parameters for printing to resemble a controlled gate.

This method produces the same result as U.control() but with substantially reduced computation time. Am I missing something or is there any better method to improve U.control() implementation for arbitrary unitary gate?

Cryoris commented 2 months ago

I'm not sure that generating the matrix representation of the controlled gate and subsequently synthesizing the now larger matrix is efficient -- both are exponentially expensive in the number of qubits. The current control mechanism should just try to unroll the circuit to be controlled and then control every operation within -- which should scale linearly with the number of operations.

jayanth260 commented 2 months ago

comparison_QvsC These plots, which depict the execution times for implementing a controlled gate of an arbitrary unitary matrix, provide a compelling comparison, clearly demonstrating the substantial performance advantage of the custom method. As observed, for an 11-qubit system with 9 target qubits, the execution time exceeds 45 minutes. Additionally, implementing the same system with 10 target qubits fails to complete even after more than an hour. Conversely, directly generating the matrix representation of the controlled gate takes significantly less time, as shown in the second plot.

It seems that Qiskit's implementation might be engaging in unnecessary computational overhead when constructing controlled gates. In theory, these two methods might converge in performance as the number of qubits becomes extremely large. However, in practice, we don't usually work with such large numbers of qubits. For most real-world applications, the difference in performance that we see in these plots is very significant.

jayanth260 commented 1 month ago

of course, bruteforce method is not optimal, but present method is having a large overhead. I want to work on this issue.

alexanderivrii commented 1 month ago

Thanks for starting this discussion, @jayanth260! In addition to execution times, do you have some data on the size/depth of the synthesized circuits? Could you please share the scripts to produce the data above?

If I understand it correctly, there are two complementary approaches for synthesizing controlled unitary gates:

Synthesize the base unitary gate into a set of basic gates + synthesize the controlled version for each such basic gate. This is what's currently implemented in Qiskit, but, as @Cryoris described, the implementation is far from being optimal and needs to be improved.
Construct the larger unitary matrix corresponding to the controlled-unitary gate, and synthesize it directly using an appropriate algorithm (question: which algorithm do we actually use in this step)?

I believe that (1) needs to be implemented in any case, as it's more general and can bring immediate benefits across the board. I believe that (2) is a very interesting investigation -- I can certainly see how synthesizing the whole controlled-unitary matrix at once can be more efficient than synthesizing many separate subproblems.

One additional thing to keep in mind: when we add a controlled-unitary gate to a quantum circuit, we almost probably want to store it either as a controlled unitary or an annotated unitary gate, and not replace it by the larger unitary gate. The decision which synthesis method to use should be done during transpilation.

I have assigned you to the issue, @jayanth260, but could you please elaborate on what exactly you are planning to do? Thanks!

jakelishman commented 1 month ago

When you're timing, please be sure you're timing the synthesis of the gate sequence for the control at the end, and not simply the calculation of the control matrix. It might help to share the code you used to benchmark to be sure.

jayanth260 commented 1 month ago

Click here to access the script that generates the data discussed. The script includes details on the number of qubits and target qubits used in the analysis.

Proposed Plan:

1. Identifying Overhead Sources:

My first step will be to conduct a detailed analysis to pinpoint the exact sources of computational overhead in the current implementation.

2. Analyzing Alternative Approaches for Unitary Gate Synthesis:

I plan to explore various established algorithms for synthesizing larger unitary gates, such as:

Cosine-Sine Decomposition (CSD): Known for its recursive decomposition approach that breaks down unitary matrices into smaller sub-blocks.
Quantum Shannon Decomposition (QSD): A quantum information-theoretic method that reduces general unitary matrices to a set of basic quantum gates.
Lie-Trotter-Suzuki Decomposition, KAK Decomposition, and other matrix factorization techniques.

Each approach will be analyzed based on its theoretical efficiency and practical applicability in the context of multi-qubit controlled gates. I aim to select the most efficient synthesis method based on circuit size, gate depth, and run-time performance.

3. Exploring a Hybrid Approach:

In addition to evaluating individual algorithms, I will investigate the feasibility of a hybrid approach. Depending on the gate type and matrix size, different algorithms may offer varying levels of efficiency:

For smaller gates, direct matrix synthesis using a method like CSD may be more efficient.
For larger gates, an alternative method, such as Quantum Shannon Decomposition, may provide better performance through reduced gate count and depth.

This dynamic, hybrid approach would allow the transpiler to select the optimal algorithm based on the specific gate characteristics and the target hardware.

I welcome any suggestions or insights you may have on this approach. In particular, if there are additional algorithms, optimization techniques, or practical considerations that could enhance the synthesis process, I would appreciate your input.

jayanth260 commented 1 month ago

I am using time it takes to "compute controlled unitary matrix" + "generating unitary gate using controlled unitary matrix"

jakelishman commented 1 month ago

If your benchmarks are only dealing with the matrices, they're not including the actual gate syntheis time for the matrix-based approach. Just calling UnitaryGate(make_controlled_array(other)) doesn't synthesise the gates - you also need to access the definition attribute.

jayanth260 commented 1 month ago

oh! I observed that U.control() takes much time because of iso.definition(). does all unitary gates go through .definition() while transpiling? I missed this, sorry (so above benchmarkings might not be correct) !! but constructing a circuit itself is taking much longer with U.control() than constucting controlled unitary matrix and then generating unitary gate

jakelishman commented 1 month ago

Yeah, at the moment, all transpilation will end up calling .definition (or some other function that does that job). @alexanderivrii, @ShellyGarion and @Cryoris have a few methods, and we have some other mechanisms (AnnotatedOperation) to delay the syntheis of multiple controls til later in the transpilation pipeline, but at some point, we always have to turn the matrix into a sequence of gates in the correct basis, and that's where the explosive cost will come in, unfortunately.

jayanth260 commented 1 month ago

Ok, I will try various decompositions and let you know any improvements. I am really sorry for my false data above

Qiskit / qiskit