Open rht opened 1 year ago
I didn't specify any path finding algorithm, so it must be the default option. I don't think the code below will help with extra context, but just in case.
def run_with_oe(circuit, pauli_string):
myconverter = CircuitToEinsum(circuit, backend=cupy)
expression, operands = myconverter.expectation(pauli_string, lightcone=True)
memory_limit = "max_input"
# memory_limit = None
path, path_info = oe.contract_path(expression, *operands, memory_limit=memory_limit)
output = oe.contract(expression, *operands, optimize=path, memory_limit=memory_limit)
return output
Where CircuitToEinsum
is from https://github.com/NVIDIA/cuQuantum for converting quantum circuits to einsum.
I think probably the contraction is not small thus the OOM initial problem (contractions in general are exponentially time and space expensive).
When you turn on the memory_limit, the algorithm simply stops doing pairwise contractions at a certain point and tries doing a single einsum (i.e. explicit summation), resulting in a likely huge equation with too many subscripts (and also exponentially slower than the pairwise contraction itself).
Printing the path_info
is your friend here and anytime a contraction is big!
Likely you will need: A) a higher quality contraction path optimizer B) a more modern way like 'slicing' to deal with OOM if the path still is high quality enough
Thank you for the reply.
resulting in a likely huge equation with too many subscripts (and also exponentially slower than the pairwise contraction itself).
Your statement is indeed echoed in the error message. I misinterpreted it as a configuration problem, whereas it is meant to say that it is not a viable option to do a single einsum. I think the error message is not explicit enough.
A) a higher quality contraction path optimizer
I have found cuTensorNet to generally find better contraction paths than opt_einsum, but takes longer to do so. I resorted to opt_einsum because I wasn't able to ~find a low-memory configuration for~ configure cuTensorNet to have its memory limited. I agree that the most viable solution is to find a higher quality contraction path.
B) a more modern way like 'slicing' to deal with OOM if the path still is high quality enough
This doesn't ring any bell to me, except for the step-dependent slicing in https://arxiv.org/abs/2012.02430. Your advice on how to do this is appreciated!
Coincidentally, the circuit I have been trying to run is the QAOA circuit in that https://arxiv.org/abs/2012.02430 paper.
I develop the library cotengra
which is meant for larger tensor network contractions: https://cotengra.readthedocs.io/en/latest/ and can do slicing etc. There are some academic slicing references here https://github.com/alibaba/acqdp#references, and that library and a few others are also options.
Here's an example, simulating https://arxiv.org/abs/2012.02430 relatively easily without slicing:
# setup graph
import networkx as nx
reg = 3
n = 210
seed = 666
G = nx.random_regular_graph(reg, n, seed=seed)
terms = {(i, j): 1 for i, j in G.edges}
# setup circuit and TN
import quimb as qu
import quimb.tensor as qtn
p = 1
gammas = qu.randn(p)
betas = qu.randn(p)
circ_ex = qtn.circ_qaoa(terms, p, gammas, betas)
tn = circ_ex.amplitude_tn()
# find contraction tree
import cotengra as ctg
opt = ctg.HyperOptimizer(
minimize='combo',
# # just subtree reconf:
reconf_opts={},
# # if you did need dynamic slicing:
# slicing_reconf_opts=dict(target_size=2**30),
progbar=True,
)
tree = tn.contraction_tree(opt, output_inds=())
# log2[SIZE]: 27.00 log10[FLOPs]: 10.46: 100%|████| 128/128 [00:22<00:00, 5.66it/s]
# perform the contraction
tree.contract(tn.arrays, progbar=True)
# 100%|███████████████████████████████████████████| 314/314 [00:03<00:00, 91.08it/s]
# array(-1.76825283e-44-9.7394392e-46j)
I do think that paper makes a mistake in claiming they can sample QAOA using frugal rejection though, one needs something like "gate by gate sampling": https://arxiv.org/abs/2112.08499 to sample with only amplitude access to a non chaotic quantum circuit.
I got KeyError: 'combo'
, that I had to comment out that argument. What I have found so far is that cotengra
, with the slicing_reconf_opts
enabled, consistently gave results in less than 1 min for n = 32
, whereas cutensornet
almost always OOM-ed. Because of this, I wasn't able to benchmark n = 210
.
I assume that cotengra
uses opt_einsum
for contraction. But if I use cuquantum
's CircuitToEinsum
, and do path finding via cotengra
,
# The pauli_string being passed is a 1-qubit string `{qubits[10]: "Z"}`
def run_with_ctg(circuit, pauli_string):
myconverter = CircuitToEinsum(circuit, backend=cupy)
expression, operands = myconverter.expectation(pauli_string, lightcone=True)
opt = ctg.HyperOptimizer(
# minimize="combo",
reconf_opts={},
# # if you did need dynamic slicing:
# 1 GiB
slicing_reconf_opts=dict(target_size=1 * 1024**3),
progbar=True,
)
tic = time.time()
path, path_info = oe.contract_path(expression, *operands, optimize=opt)
output = oe.contract(expression, *operands, optimize=path)
elapsed = time.time() - tic
print("Elapsed oe", round(elapsed, 3))
return output
it took ages to finish the computation. The main differentiator seems to be that quimb.tensor
has a more efficient einsum expression than the one generated by cuquantum
. I have already made sure that the circuit are the same, as in, they are described in the documentation in the quimb
docs.
QAOA Digression aside, I think this issue is a duplicate of #95 (i.e. the automatic memory slicing and automatic contraction feature)? #95 is solved if #125 is resolved. At this point, it is a matter of packaging cotengra
as an optional dependency of opt_einsum
? A quick short-term solution could be to tweak the error message to refer to cotengra
.
Yes possibly 'combo'
isn't in a released version of opt_einsum
yet.
cotengra
uses its own contraction implementation for a few reasons:
einsum
, (this is probably the slow down you saw). (This change I do have an open PR for in numpy here - https://github.com/numpy/numpy/pull/23513, which opt_einsum
could then benefit from). This is particularly apparent for quantum circuits with diagonal gates which produce hyper indices, e.g. many orders of magnitude.quimb
It would be great to include a lot of this stuff in opt_einsum
, but currently I do not have time, since things have to be done quite carefully for opt_einsum
as a library which many depend on. Also cotengra
was designed around large contractions and built around quite different data structures like the ContractionTree
. I suppose my plan is to instead gradually reproduce some of opt_einsum
interfaces in cotengra
, which should make it easier to swap things in at a higher level.
I have a
contract_path
andcontract
sequence on a tensor that would OOM whenmemory_limit
isNone
. I tried to reduce the memory consumption of the computation (I don't mind if the computation runs slow, as long as it gives me answer), but upon limiting it, I encountered the errorDuring the
contract
step. Is there any extra step that I have to do? I have gone through the documentation and issues, and can't find any relevant information about this.