Open HaoTy opened 2 months ago
Thanks for the detailed code and questions in the post. It makes it easy to understand your situation and help. There are two issues here:
1) Why does compile
give something seemingly incorrect?
2) Why does the direct synthesis workflow take longer than compile
?
I have been able to reproduce both of these, but neither is a bug. The first is because compile
performs mapping which does not respect input and output qudit orderings. If you set with_mapping=True
in the compile call, you will get two additional return values describing the input and output qudit orderings:
compiled_circuit, pre, post = compile(
Circuit.from_unitary(target_unitary), # fast
# target_unitary, # extremely slow
MachineModel(n, CouplingGraph.linear(n)),
optimization_level=1,
max_synthesis_size=n,
with_mapping=True,
)
Which can then be used to verify the problem:
from bqskit.qis import PermutationMatrix
Pi = PermutationMatrix.from_qubit_location(n, pre)
Pf = PermutationMatrix.from_qubit_location(n, post)
print(1 - np.abs(
np.trace(
compiled_circuit.get_unitary()
@ (Pf.T @ target_unitary @ Pi).conj().T
)
) / 2**n)
This gives 0.0
, but I wouldn't be surprised if you get something with an e-15
on some runs. The verification procedure is also equivalent in math to:
print(compiled_circuit.get_unitary().get_distance_from(Pf.T @ target_unitary @ Pi, 1))
For (2), the different workflows take varying amounts of time because they do different things. The "Off-the-Shelf" compile
workflow at optimization level 1 will synthesize the circuit with all-to-all connectivity and then map using SABRE. The direct synthesis call will directly synthesize for linear connectivity. This will take (usually) longer as the search procedure will need to go deeper into the search tree to find a solution for less connected architectures. These differences lead to different results in the quality of output:
print(compiled_circuit.gate_counts)
yields
{CNOTGate: 23, U3Gate: 53}
for compile
, and
{RYGate: 38, CNOTGate: 19, U3Gate: 4, RXGate: 19, RZGate: 19}
for the direct synthesis call. So, even though the direct synthesis call took a little bit longer, it gave you better results. These will definitely vary from run to run unless you fix the seed. Both of them are not the best we can do with BQSKit. If really want the best result and are okay trading some more time for quality, then you should use PAS or PAM for the QFT. This can be done with optimization_level=4
or by building the workflow yourself using build_seqpam_mapping_optimization_workflow
The real issue here may be documentation. If you feel that this information should be included somewhere in the documentation, I will gladly accept a PR. I will leave this comment open. Feel free to follow up with more questions or close it.
Thanks for your fast response and detailed explanation! I now understand bqskit better! However, the second issue is not
- Why does the direct synthesis workflow take longer than compile?
but the opposite. The speed of the default compile
unitary synthesis workflow was drastically slower for me than directly specifying
workflow = [
SetModelPass(MachineModel(n, CouplingGraph.linear(n))),
SetTargetPass(UnitaryMatrix(target_unitary)),
LEAPSynthesisPass(),
]
even though they are almost the same (the differences are some IfThenElsePass
es and a single qudit retarget pass for optimization level 1). I couldn't follow the debugging tracing very well due to the worker mechanism, but the extra time seemed to be spent somewhere else than on LEAP.
When
max_synthesis_size
is 4 or above, directly synthesizing a unitary takes significantly longer than wrapping the unitary in a circuit. Reproducible example:In addition, the above compiled circuit is only correct for the all-zero initial state:
which does not seem to be by design judging from the workflow implementation.
Directly running LEAP as follows correctly synthesizes the unitary: