[Pre-trained weights] Fidelity of encoding for arbitrary unitaries

Greetings there,

Hope all are well. I came across your great repo and have been experimenting with it for a few days now. I have been using the Floki00/qc_unitary_3qubit for the moment and noticed that the fidelity of the encoding for arbitrary unitaries almost never reaches above 0.7, which is a considerable error for 3 qubits.

I am aware that the model used may need fine-tuning, however, I'd like to ensure that my use of it is correct and whether the infidelity mentioned is expected.

Here is my current wrapper of genQC:

from __future__ import annotations

__all__ = ["Diffusion"]

from genQC.pipeline.diffusion_pipeline import DiffusionPipeline
from genQC.inference.infer_compilation import generate_comp_tensors, convert_tensors_to_circuits
import genQC.util as util
import numpy as np
from numpy.typing import NDArray
from qiskit.quantum_info import Operator as QiskitOperator
import torch
from typing import Sequence, SupportsIndex, TYPE_CHECKING

import qickit
if TYPE_CHECKING:
    from qickit.circuit import Circuit
from qickit.primitives.operator import Operator
from qickit.synthesis.unitarypreparation import UnitaryPreparation

# Set the random seed for reproducibility
np.random.seed(0)
torch.manual_seed(0)

class Diffusion(UnitaryPreparation):
    """ `qickit.synthesis.unitarypreparation.Diffusion` is the class for performing unitary
    compilation using diffusion models (DMs).
    ref: https://arxiv.org/abs/2311.02041

    Notes
    -----
    The default pre-trained model used in this class is for 3-qubit unitaries only.

    Parameters
    ----------
    `output_framework` : type[qickit.circuit.Circuit]
        The quantum circuit framework.
    `model` : str, optional, default="Floki00/qc_unitary_3qubit"
        The pre-trained model to use.
    `prompt` : str, optional, default="Compile using: ['h', 'cx', 'z', 'ccx', 'swap']"
        The prompt to use for the compilation.
    `max_num_gates` : int, optional, default=12
        The maximum number of gates to use in the compilation.
    `num_samples` : int, optional, default=128
        The number of samples to use in the compilation.
    `min_fidelity` : float, optional, default=0.9
        The minimum fidelity to accept the solution.

    Attributes
    ----------
    `output_framework` : type[qickit.circuit.Circuit]
        The quantum circuit framework.
    `model` : str
        The pre-trained model to use.
    `prompt` : str
        The prompt to use for the compilation.
    `max_num_gates` : int
        The maximum number of gates to use in the compilation.
    `num_samples` : int
        The number of samples to use in the compilation.
    `pipeline` : genQC.pipeline.diffusion_pipeline.DiffusionPipeline
        The pre-trained model pipeline.

    Raises
    ------
    TypeError
        If the output framework is not a subclass of `qickit.circuit.Circuit`.
    ValueError
        If the minimum fidelity is not in the range [0, 1].
    """
    def __init__(
            self,
            output_framework: type[Circuit],
            model: str="Floki00/qc_unitary_3qubit",
            prompt: str="Compile using: ['h', 'cx', 'z', 'ccx', 'swap']",
            max_num_gates: int=12,
            num_samples: int=128,
            min_fidelity: float=0.9
        ) -> None:

        super().__init__(output_framework)
        self.model = model
        self.prompt = prompt
        self.max_num_gates = max_num_gates
        self.num_samples = num_samples

        if not (min_fidelity >= 0 and min_fidelity <= 1):
            raise ValueError("The minimum fidelity should be in the range [0, 1].")
        self.min_fidelity = min_fidelity

        # Determine the device to use (CPU or GPU)
        device = util.infer_torch_device()

        # Clean the memory
        util.MemoryCleaner.purge_mem()

        # Load the pre-trained model
        self.pipeline = DiffusionPipeline.from_pretrained(repo_id=model, device=device)

    def apply_unitary(
            self,
            circuit: Circuit,
            unitary: NDArray[np.complex128] | Operator,
            qubit_indices: int | Sequence[int]
        ) -> Circuit:
        """ Apply the unitary to the circuit.

        Parameters
        ----------
        `circuit` : qickit.circuit.Circuit
            The quantum circuit to apply the unitary to.
        `unitary` : NDArray[np.complex128] | qickit.primitives.Operator
            The unitary to apply.
        `qubit_indices` : int | Sequence[int]
            The qubit indices to apply the unitary to.

        Returns
        -------
        circuit : `qickit.circuit.Circuit`
            The quantum circuit with the unitary applied.

        Raises
        ------
        ValueError
            If the unitary is not a 3-qubit unitary.
            No solution found with fidelity > 0.9.
        """
        if isinstance(unitary, np.ndarray):
            unitary = Operator(unitary)

        if isinstance(qubit_indices, SupportsIndex):
            qubit_indices = [qubit_indices]

        if self.model == "Floki00/qc_unitary_3qubit" and unitary.num_qubits != 3:
            raise ValueError("The default pre-trained model is for 3-qubit unitaries only.")

        num_qubits = unitary.num_qubits

        # As the neural network works only with real numbers, we first separate
        # the two components and create a 2 dimensional tensor for the magnitude
        # of each component
        U_r, U_i = torch.Tensor(np.real(unitary.data)), torch.Tensor(np.imag(unitary.data))
        U_tensor = torch.stack([U_r, U_i], dim=0)

        # Now we generate a tensor representation of the desired quantum circuit using the DM based on the prompt and U
        # This is also known as inference
        out_tensors = generate_comp_tensors(
            pipeline=self.pipeline,
            prompt=self.prompt,
            U=U_tensor,
            samples=self.num_samples,
            num_of_qubits=num_qubits,
            system_size=num_qubits,
            max_gates=self.max_num_gates,
            g=10
        )

        # Find the best solution in terms of the number of gates and fidelity
        qc_list, _ = convert_tensors_to_circuits(out_tensor=out_tensors, gate_pool=self.pipeline.gate_pool)

        # Find the best solution in terms of the number of gates and fidelity
        depths = []

        for qc in qc_list:
            qc_unitary = QiskitOperator(qc).data

            fidelity = np.abs(
                np.dot(
                    np.conj(qc_unitary.flatten()), # type: ignore
                    unitary.data.flatten()
                )
            )/2**num_qubits

            if fidelity > self.min_fidelity:
                depths.append(qc.depth())

        if len(depths) == 0:
            raise ValueError(f"No solution found with fidelity > {self.min_fidelity}.")

        # Find the shortest circuit with fidelity > `self.min_fidelity`
        best_qc = qc_list[depths.index(min(depths))]

        circuit = qickit.circuit.Circuit.from_qiskit(best_qc, self.output_framework)

        return circuit

I tried cuda-quantum's notebook too for safety, and the infidelity (almost 0.5 infidelity) is present there as well. I have played around with larger number of samples as well as longer max number of gates (even beyond the theoretical limit of most exact encoders which is $2^{N+1}$ where N is the number of qubits, i.e., 30, 40, and even 50) to no avail. I'd deeply appreciate some guidance if possible.

Oh, FYI, qickit is my package which is not published yet, so if you like to test it yourself, please let me know.

Hello @ACE07-Sev 👋, thanks for your interest in this project.

Regarding the limitations of the Floki00/qc_unitary_3qubit weights, there are some restrictions this model was trained on: 3 qubits, a maximum of 12 gates and the (discrete) gate set ['h', 'cx', 'z', 'x', 'ccx', 'swap']. When one wants to compile a unitary U, one should be aware that the model can only possibly find solutions if there exists a (theoretical) physical solution, within these constraints, for that given U. Hence, the set of compile-able unitaries is quite smaller than the whole unitary group.

Within these constraints we show the expected compilation distribution in our paper Quantum circuit synthesis with diffusion models in Figure 4. Also, we found “the model successfully identifies the correct exact unitary for 92.6 % of the 3100 tested unitaries”, where exact means a fidelity of 1.

However, we are working on bigger and better models, which we plan to release alongside future publications. Stay tuned for that 😄.

Out of curiosity, what kind of unitaries are you testing on and are interested in? Can you give an example of a unitary matrix and maybe a corresponding circuit which the model should find? This could help us benchmark and tune the next models. We are also interested in what tasks a user may be looking for, which we can take into account for future considerations of the project.

Greetings there,

Hope you are well. Thank you very much for the prompt response, and apologies for the delay on my side. So, I tried random gates within the gate set (nothing beyond a depth of 5 or 6) as well as random generated unitaries using scipy.linalg's unitary_group for 3 qubits.

Results are mostly 65 percent fidelity or so at best. I'll send a comprehensive report here today or tomorrow for your kind reference.

So, as for what users would be looking for I'd say high fidelity compilation with as low of a depth as possible. I have been working on this task for a year now, and found MPS approach to be more suitable (i.e., encoded a 22 qubit state with ~12111 depth instead of ~8M with 97 percent fidelity). Whilst they are very effective for lowering depth, they take a bit of time to run given the low level QR and SVD operations needed. If RL models can become as robust with lower inference times, that would certainly be useful.

I would overall suggest focusing more on general unitary compilation. Given that's what I'm actively working on, I'd love to contribute and even collaborate if you're interested.

The low accuracy on unitaries from scipy.stats.unitary_group is expected, it is very unlikely that such a random U can be decomposed (exactly) into ['h', 'cx', 'z', 'x', 'ccx', 'swap'] using a maximum of 12 gates. Note that there are no parametrized gates (e.g. rx or ry). While the gate set is universal (already h and ccx is) this only holds for a very large number of gates.

You can also check the example notebook on unitary compilation [doc] [notebook].

When you mention “high fidelity compilation; ... MPS approach” do you refer to quantum state preparation or unitary matrix to circuit compilation? In your case, where would these “general unitary compilation” tasks stem from (e.g. a unitary evolution from a particular Hamiltonian)?

The low accuracy on unitaries from scipy.stats.unitary_group is expected, it is very unlikely that such a random U can be decomposed (exactly) into ['h', 'cx', 'z', 'x', 'ccx', 'swap'] using a maximum of 12 gates. Note that there are no parametrized gates (e.g. rx or ry). While the gate set is universal (already h and ccx is) this only holds for a very large number of gates. You can also check the example notebook on unitary compilation [[doc]] (https://florianfuerrutter.github.io/genQC/examples/unitary_compilation.html) [[notebook]] (https://github.com/FlorianFuerrutter/genQC/blob/main/src/examples/2_unitary_compilation.ipynb).

Yes, I instead used random circuits made of the gate set mentioned within the maximum number of gates constraint and it worked better. I get the gist now, though I would like to see the scaling of this for an arbitrary U compared to exact approaches such as QSD.

My hope would be that the RL approach would fare better in terms of circuit depth, otherwise, it would be better to use Shende/QSD for exact encoding, or MPS/MPO for approximate encoding. Each approach has its pros and cons. For instance, whilst Shende is much faster, it also produces circuits that are exponential in depth. On the other hand, whilst the MPS approach produces significantly shallower circuits, it also consumes a considerable amount of time with low-level QR and SVD operations.

The appeal of MPS/MPO is that it proposes an answer to "Can vs Can't". With the emergence of other alternatives, we can later contemplate the "Should vs Shouldn't" question.

When you mention “high fidelity compilation; ... MPS approach” do you refer to quantum state preparation or unitary matrix to circuit compilation?

So, I have currently done state preparation (using an MPS) and am actively working on adding the unitary preparation (using an MPO).

In your case, where would these “general unitary compilation” tasks stem from (e.g. a unitary evolution from a particular Hamiltonian)?

So, I interpret quantum computing as simply $A\ket{\psi}$ where $A$ is a unitary matrix over N qubits, and $\ket{\psi}$ is a vector over N qubits (simply a tensor contraction backend). These two primitives form our high-level code. Now, to perform actual quantum computing they need to be compiled to low-level code, aka quantum circuits. The main focus of my work is enabling this conversion to be done as shallowly as possible.

The reason this is useful is because it's domain agnostic. Regardless of whether you do quantum chemistry, physics simulation, optimization, or QML, at the end of the day you are working with these primitives and need to represent them as quantum circuits so that you may run them on actual quantum hardware. Given the exponential depth scaling for both state preparation and unitary preparation, the scale of problems we can meaningfully implement on actual hardware is often limited to trivial ones. The goal with the compiler I am making is to reduce the cost and allow for implementing larger, more practical problems to be implementable on current hardware. So, for this to be possible I must cover arbitrary vectors as well as unitary matrices up to an adequate fidelity.

A good analogy for this would be the impact of making a much more efficient compiler for python on codes that are developed using python. Regardless of what code you write (or its use-case), you will observe the improvement. So, that's where the need for efficient general unitary compilation comes from.

By the way, I also found a rather silly bug in the code I sent before. It's supposed to find the shortest circuit with 0.9+ fidelity, and it was missing the circuit_solutions list.

# Find the best solution in terms of the number of gates and fidelity
qc_list, _ = convert_tensors_to_circuits(out_tensor=out_tensors, gate_pool=self.pipeline.gate_pool)

# Find the best solution in terms of the number of gates and fidelity
depths = []
solution_circuits = []

for qc in qc_list:
    qc_unitary = QiskitOperator(qc).data

    fidelity = np.abs(
        np.dot(
            np.conj(qc_unitary.flatten()), # type: ignore
            unitary.data.flatten()
        )
    )/2**num_qubits

    if fidelity > self.min_fidelity:
        depths.append(qc.depth())
        solution_circuits.append(qc)

if len(depths) == 0:
    raise ValueError(f"No solution found with fidelity > {self.min_fidelity}.")

# Find the shortest circuit with fidelity > `self.min_fidelity`
best_qc = solution_circuits[depths.index(min(depths))]

circuit = qickit.circuit.Circuit.from_qiskit(best_qc, self.output_framework)

return circuit

Thank you for the clarification.

I will close this issue now, as it is a limitation of this particular pre-trained model weights and more a feature request for better models.

FlorianFuerrutter / genQC

[Pre-trained weights] Fidelity of encoding for arbitrary unitaries #4