using HPC system with the compile

manu123416 commented 5 months ago

Hi, I tried executing to generate a circuit for a 16x16 unitary matrix using the code as below: circuit_transpose_Q= compile(transpose_Q, optimization_level = 1, max_synthesis_size= 4, synthesis_epsilon=1e-1, error_threshold=5e-1)

Eventually, my experiment got TIMEDOUT on a HPC system. Probably, I am doing something.

Below is my sbatch file, which I used for running my experiment:

#!/bin/bash
#SBATCH --job-name=qiskit           # Job name
#SBATCH --partition=sixhour           # Partition Name (Required)
#SBATCH --mail-user=manu.chaudhary@ku.edu   # Where to send mail    
#SBATCH --mail-type=BEGIN,END,FAIL

#SBATCH --ntasks=42                # Run on a single CPU
#SBATCH --cpus-per-task=1
#SBATCH --mem=128gb
#SBATCH --time=0-06:00:00             # Time limit days-hrs:min:sec
#SBATCH --output=%j.log               # Standard output and error log
#SBATCH --gres=gpu                     # 1 GPU
pwd; hostname; date

# Put your code below this line
module load conda/latest
eval "$(conda shell.bash hook)"
conda activate /panfs/pfs.local/scratch/i2s/m9c693/BQSKit_study

echo "running test"

python /home/m9c693/work/hhl_c2q_bqskit/BQSKit_sample_codes/bqskit-tutorial-main/convert_matrix_to_circuit/matrix_to_circuit_3qubit_v2.py
date

It would be great if I could be helped on this.

edyounis commented 5 months ago

Sure! With 4 qubit synthesis, the default workflow will use the LEAP algorithm. Depending on the difficulty of synthesizing your unitary and computational resources at hand, this may take longer than 6 hours. While we are constantly working on improving the performance of synthesis algorithms, there are a few things to check and tricks we can explore to synthesize this unitary much faster.

I would first recommend enabling logging so that we can track the algorithm's progress. This will give us insight into how "far away" it was from a solution or if it was stuck and there is a bigger issue. It is unlikely that synthesis got stuck, but we have had some reports with HPC systems causing issues in BQSKit's attached mode. So it'll be nice to know that the synthesis did start and that everything was started on the HPC system fine. There is a guide to running BQSKit in detached mode that I'll leave here for reference. Detached mode is more robust on HPC systems, even on a single node.

In the case that everything is working normally, then there are a few options to speed up the synthesis. If you have domain-specific knowledge about your problem that gives you an idea of the final circuit structure (layout and type of gates) or some "building block" inside the final structure, then we can leverage this to speed up search-based synthesis algorithms like LEAP. To accomplish this, we can build a LayerGenerator by subclassing and implementing the two functions gen_initial_layer and gen_successors. You can look at our pre-built generators for examples. You may even want to just try the stair generator. The fundamental concept here is that building deeper circuits quicker will lead to faster synthesis results but potentially trade-off the quality of the result. Leveraging domain-specific knowledge allows you to mitigate this.

Once you have built your LayerGenerator, you can just pass this into LEAPSynthesisPass and execute it in a custom workflow:

from bqskit.passes import *
from bqskit.compiler import Compiler
from bqskit.ir.circuit import Circuit

custom_layer_gen = ...

workflow = [LEAPSynthesisPass(layer_generator=custom_layer_gen, success_threshold=1e-1)]

circuit = Circuit.from_unitary(transpose_Q) # just a wrapper around the unitary

with Compiler() as compiler:
    compiled_circuit = compiler.compile(circuit, workflow)

If you would like to also check the error on the final output, you can do a direct calculation since this is 4 qubits:

assert compiled_circuit.get_unitary().get_distance_from(circuit) < 1e-1

The other option that you can go down is to use a synthesis algorithm that is more scalable such as QFAST or QPredict.

In either case, your workflow may look like:

workflow = [
    QFastDecompositionPass(), # or QPredictDecompositionPass()
    ForEachBlockPass(QSearchSynthesisPass()),
    UnfoldPass(),
]

You can take a look at the QFAST example for reference.

manu123416 commented 5 months ago

Thank you @edyounis. Today I tried the bsqkit compile function by adjusting the timing parameter of sbatch file and now the compile function of bsqkit is working correctly. Thank you for great help.

manu123416 commented 3 months ago

Hi @edyounis , I tried using using the compileAPI using HPC system but could get a circuit beyond 9 qubits, but I am still not sure which all APIs I should try to use with compile or separately which increase the execution speed. Can I get some sample codes regarding using BQSKit in detached mode. Also, some sample codes for using LayerGenerator, gen_initial_layer, gen_successors, pre-built generators, LEAPSynthesisPass , which all I think should work similar to compile i.e. generating a circuit from matrix?

edyounis commented 3 months ago

Direct synthesis for unitaries with 9+ qubits is going to be a costly operation. There are definitely tricks that can make it faster, but these will be domain-specific. As a good place to start, I recommend checking out our GPU implementation of QFactor to use as an instantiater with LEAP and a custom layer generator that uses large building blocks. Alternatively, you can try out QPredict with the qfactor GPU instantiater.

BQSKit / bqskit

using HPC system with the compile #219