Open Cryoris opened 3 years ago
I am not sure following conern comes under this issue.
Converting an arbitrary gate (generated by UnitaryGate()
) to its controlled version is computationally expensive, primarily due to operations on the isometry.
Instead of performing all operations on the isometry, we can use UnitaryGate(_compute_control_matrix())
for the controlled version of the given base unitary. This approach maintains the appearance of a controlled gate while significantly reducing computation time.
UnitaryGate()
constructor.This method produces the same result as U.control()
but with substantially reduced computation time. Am I missing something or is there any better method to improve U.control()
implementation for arbitrary unitary gate?
I'm not sure that generating the matrix representation of the controlled gate and subsequently synthesizing the now larger matrix is efficient -- both are exponentially expensive in the number of qubits. The current control mechanism should just try to unroll the circuit to be controlled and then control every operation within -- which should scale linearly with the number of operations.
These plots, which depict the execution times for implementing a controlled gate of an arbitrary unitary matrix, provide a compelling comparison, clearly demonstrating the substantial performance advantage of the custom method. As observed, for an 11-qubit system with 9 target qubits, the execution time exceeds 45 minutes. Additionally, implementing the same system with 10 target qubits fails to complete even after more than an hour. Conversely, directly generating the matrix representation of the controlled gate takes significantly less time, as shown in the second plot.
It seems that Qiskit's implementation might be engaging in unnecessary computational overhead when constructing controlled gates. In theory, these two methods might converge in performance as the number of qubits becomes extremely large. However, in practice, we don't usually work with such large numbers of qubits. For most real-world applications, the difference in performance that we see in these plots is very significant.
of course, bruteforce method is not optimal, but present method is having a large overhead. I want to work on this issue.
Thanks for starting this discussion, @jayanth260! In addition to execution times, do you have some data on the size/depth of the synthesized circuits? Could you please share the scripts to produce the data above?
If I understand it correctly, there are two complementary approaches for synthesizing controlled unitary gates:
I believe that (1) needs to be implemented in any case, as it's more general and can bring immediate benefits across the board. I believe that (2) is a very interesting investigation -- I can certainly see how synthesizing the whole controlled-unitary matrix at once can be more efficient than synthesizing many separate subproblems.
One additional thing to keep in mind: when we add a controlled-unitary gate to a quantum circuit, we almost probably want to store it either as a controlled unitary or an annotated unitary gate, and not replace it by the larger unitary gate. The decision which synthesis method to use should be done during transpilation.
I have assigned you to the issue, @jayanth260, but could you please elaborate on what exactly you are planning to do? Thanks!
When you're timing, please be sure you're timing the synthesis of the gate sequence for the control at the end, and not simply the calculation of the control matrix. It might help to share the code you used to benchmark to be sure.
Click here to access the script that generates the data discussed. The script includes details on the number of qubits and target qubits used in the analysis.
My first step will be to conduct a detailed analysis to pinpoint the exact sources of computational overhead in the current implementation.
I plan to explore various established algorithms for synthesizing larger unitary gates, such as:
Cosine-Sine Decomposition (CSD): Known for its recursive decomposition approach that breaks down unitary matrices into smaller sub-blocks.
Quantum Shannon Decomposition (QSD): A quantum information-theoretic method that reduces general unitary matrices to a set of basic quantum gates.
Lie-Trotter-Suzuki Decomposition, KAK Decomposition, and other matrix factorization techniques.
Each approach will be analyzed based on its theoretical efficiency and practical applicability in the context of multi-qubit controlled gates. I aim to select the most efficient synthesis method based on circuit size, gate depth, and run-time performance.
In addition to evaluating individual algorithms, I will investigate the feasibility of a hybrid approach. Depending on the gate type and matrix size, different algorithms may offer varying levels of efficiency:
For smaller gates, direct matrix synthesis using a method like CSD may be more efficient.
For larger gates, an alternative method, such as Quantum Shannon Decomposition, may provide better performance through reduced gate count and depth.
This dynamic, hybrid approach would allow the transpiler to select the optimal algorithm based on the specific gate characteristics and the target hardware.
I welcome any suggestions or insights you may have on this approach. In particular, if there are additional algorithms, optimization techniques, or practical considerations that could enhance the synthesis process, I would appreciate your input.
I am using time it takes to "compute controlled unitary matrix" + "generating unitary gate using controlled unitary matrix"
If your benchmarks are only dealing with the matrices, they're not including the actual gate syntheis time for the matrix-based approach. Just calling UnitaryGate(make_controlled_array(other))
doesn't synthesise the gates - you also need to access the definition
attribute.
oh! I observed that U.control()
takes much time because of iso.definition()
. does all unitary gates go through .definition()
while transpiling? I missed this, sorry (so above benchmarkings might not be correct) !! but constructing a circuit itself is taking much longer with U.control()
than constucting controlled unitary matrix and then generating unitary gate
Yeah, at the moment, all transpilation will end up calling .definition
(or some other function that does that job). @alexanderivrii, @ShellyGarion and @Cryoris have a few methods, and we have some other mechanisms (AnnotatedOperation
) to delay the syntheis of multiple controls til later in the transpilation pipeline, but at some point, we always have to turn the matrix into a sequence of gates in the correct basis, and that's where the explosive cost will come in, unfortunately.
Ok, I will try various decompositions and let you know any improvements. I am really sorry for my false data above
What is the expected enhancement?
Improve the way we implement controls of general gates, e.g. by
add_control.py
)Example
For a circuit that contains a gate that's in the hardcoded gate list in
add_control
(like'x'
):but for one that isn't (like
'sx'
):even though the
SXGate
defines an efficient control method: