Setting memory_limit in contract_path causes "ValueError: invalid subscript 'Ų' in einstein sum subscripts string, subscripts must be letters"

rht commented 1 year ago

I have a contract_path and contract sequence on a tensor that would OOM when memory_limit is None. I tried to reduce the memory consumption of the computation (I don't mind if the computation runs slow, as long as it gives me answer), but upon limiting it, I encountered the error

ValueError: invalid subscript 'Ų' in einstein sum subscripts string, subscripts must be letters

During the contract step. Is there any extra step that I have to do? I have gone through the documentation and issues, and can't find any relevant information about this.

rht commented 1 year ago

I didn't specify any path finding algorithm, so it must be the default option. I don't think the code below will help with extra context, but just in case.

def run_with_oe(circuit, pauli_string):
    myconverter = CircuitToEinsum(circuit, backend=cupy)
    expression, operands = myconverter.expectation(pauli_string, lightcone=True)
    memory_limit = "max_input"
    # memory_limit = None
    path, path_info = oe.contract_path(expression, *operands, memory_limit=memory_limit)
    output = oe.contract(expression, *operands, optimize=path, memory_limit=memory_limit)
    return output

Where CircuitToEinsum is from https://github.com/NVIDIA/cuQuantum for converting quantum circuits to einsum.

jcmgray commented 1 year ago

I think probably the contraction is not small thus the OOM initial problem (contractions in general are exponentially time and space expensive).

When you turn on the memory_limit, the algorithm simply stops doing pairwise contractions at a certain point and tries doing a single einsum (i.e. explicit summation), resulting in a likely huge equation with too many subscripts (and also exponentially slower than the pairwise contraction itself).

Printing the path_info is your friend here and anytime a contraction is big!

Likely you will need: A) a higher quality contraction path optimizer B) a more modern way like 'slicing' to deal with OOM if the path still is high quality enough

rht commented 1 year ago

Thank you for the reply.

resulting in a likely huge equation with too many subscripts (and also exponentially slower than the pairwise contraction itself).

Your statement is indeed echoed in the error message. I misinterpreted it as a configuration problem, whereas it is meant to say that it is not a viable option to do a single einsum. I think the error message is not explicit enough.

A) a higher quality contraction path optimizer

I have found cuTensorNet to generally find better contraction paths than opt_einsum, but takes longer to do so. I resorted to opt_einsum because I wasn't able to ~find a low-memory configuration for~ configure cuTensorNet to have its memory limited. I agree that the most viable solution is to find a higher quality contraction path.

B) a more modern way like 'slicing' to deal with OOM if the path still is high quality enough

This doesn't ring any bell to me, except for the step-dependent slicing in https://arxiv.org/abs/2012.02430. Your advice on how to do this is appreciated!

rht commented 1 year ago

Coincidentally, the circuit I have been trying to run is the QAOA circuit in that https://arxiv.org/abs/2012.02430 paper.

jcmgray commented 1 year ago

I develop the library cotengra which is meant for larger tensor network contractions: https://cotengra.readthedocs.io/en/latest/ and can do slicing etc. There are some academic slicing references here https://github.com/alibaba/acqdp#references, and that library and a few others are also options.

Here's an example, simulating https://arxiv.org/abs/2012.02430 relatively easily without slicing:

# setup graph
import networkx as nx

reg = 3
n = 210
seed = 666
G = nx.random_regular_graph(reg, n, seed=seed)
terms = {(i, j): 1 for i, j in G.edges}

# setup circuit and TN
import quimb as qu
import quimb.tensor as qtn

p = 1
gammas = qu.randn(p)
betas = qu.randn(p)
circ_ex = qtn.circ_qaoa(terms, p, gammas, betas)
tn = circ_ex.amplitude_tn()

# find contraction tree
import cotengra as ctg

opt = ctg.HyperOptimizer(
    minimize='combo',
    # # just subtree reconf:
    reconf_opts={},
    # # if you did need dynamic slicing:
    # slicing_reconf_opts=dict(target_size=2**30),
    progbar=True,
)
tree = tn.contraction_tree(opt, output_inds=())
# log2[SIZE]: 27.00 log10[FLOPs]: 10.46: 100%|████| 128/128 [00:22<00:00,  5.66it/s]

# perform the contraction
tree.contract(tn.arrays, progbar=True)
# 100%|███████████████████████████████████████████| 314/314 [00:03<00:00, 91.08it/s]
# array(-1.76825283e-44-9.7394392e-46j)

I do think that paper makes a mistake in claiming they can sample QAOA using frugal rejection though, one needs something like "gate by gate sampling": https://arxiv.org/abs/2112.08499 to sample with only amplitude access to a non chaotic quantum circuit.

rht commented 1 year ago

I got KeyError: 'combo', that I had to comment out that argument. What I have found so far is that cotengra, with the slicing_reconf_opts enabled, consistently gave results in less than 1 min for n = 32, whereas cutensornet almost always OOM-ed. Because of this, I wasn't able to benchmark n = 210.

I assume that cotengra uses opt_einsum for contraction. But if I use cuquantum's CircuitToEinsum, and do path finding via cotengra,

# The pauli_string being passed is a 1-qubit string `{qubits[10]: "Z"}`
def run_with_ctg(circuit, pauli_string):
    myconverter = CircuitToEinsum(circuit, backend=cupy)
    expression, operands = myconverter.expectation(pauli_string, lightcone=True)
    opt = ctg.HyperOptimizer(
        # minimize="combo",
        reconf_opts={},
        # # if you did need dynamic slicing:
        # 1 GiB
        slicing_reconf_opts=dict(target_size=1 * 1024**3),
        progbar=True,
    )
    tic = time.time()
    path, path_info = oe.contract_path(expression, *operands, optimize=opt)
    output = oe.contract(expression, *operands, optimize=path)
    elapsed = time.time() - tic
    print("Elapsed oe", round(elapsed, 3))
    return output

it took ages to finish the computation. The main differentiator seems to be that quimb.tensor has a more efficient einsum expression than the one generated by cuquantum. I have already made sure that the circuit are the same, as in, they are described in the documentation in the quimb docs.

QAOA Digression aside, I think this issue is a duplicate of #95 (i.e. the automatic memory slicing and automatic contraction feature)? #95 is solved if #125 is resolved. At this point, it is a matter of packaging cotengra as an optional dependency of opt_einsum? A quick short-term solution could be to tweak the error message to refer to cotengra.

jcmgray commented 1 year ago

Yes possibly 'combo' isn't in a released version of opt_einsum yet.

cotengra uses its own contraction implementation for a few reasons:

it avoids calling numpys low level C einsum, (this is probably the slow down you saw). (This change I do have an open PR for in numpy here - https://github.com/numpy/numpy/pull/23513, which opt_einsum could then benefit from). This is particularly apparent for quantum circuits with diagonal gates which produce hyper indices, e.g. many orders of magnitude.
it abstracts the backend etc using https://github.com/jcmgray/autoray, the same as quimb

It would be great to include a lot of this stuff in opt_einsum, but currently I do not have time, since things have to be done quite carefully for opt_einsum as a library which many depend on. Also cotengra was designed around large contractions and built around quite different data structures like the ContractionTree. I suppose my plan is to instead gradually reproduce some of opt_einsum interfaces in cotengra, which should make it easier to swap things in at a higher level.

dgasmith / opt_einsum

Setting memory_limit in contract_path causes "ValueError: invalid subscript 'Ų' in einstein sum subscripts string, subscripts must be letters" #217