Question: optimal contractions and batch dimension

dgasmith / opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

https://dgasmith.github.io/opt_einsum/

MIT License

863 stars 68 forks source link

Question: optimal contractions and batch dimension #158

Closed Linux-cpp-lisp closed 3 years ago

Linux-cpp-lisp commented 3 years ago

Hi all,

Thanks for your work on this great project!

I'm working on applying opt_einsum in e3nn, an application where I need to predetermine a good contraction path for later use. Typical examples of einsums I need to do this for are:

uvw,ijk,zuvij->zwk
zuvw,ijk,zuvij->zwk
zui,zvj->zuvij
...

where all dimensions except z — the batch dimension — are of fixed, known size.

I've run a small experiment to see how the optimal contraction varies with z (using the 'optimal' optimization strategy) and got the result that the optimal contraction is completely unaffected: Here each line is for a different einsum. "Path taken" just indexes unique (to each einsum) paths returned by opt_einsum.contract_path.

Is this universally true? That is, when all shapes except a batch dimension that is not contracted over are fixed, is the optimal path invariant to the size of the batch dimension?

Thanks for your time!

dgasmith commented 3 years ago

A single dimension changing leaving the contraction order unchanged isn't generally true, but occurs in many scenarios. A popular counter example is the following:

A_ij B_jk C_kl -> D_il

Where ijl are fixed, if k is smaller or larger than the rest of the indices the optimal order will switch between A(BC) and (AB)C.

jcmgray commented 3 years ago

I'll just add that if one takes 'batch dimension' to mean an index of size d that appears on all the inputs and in the output, then in terms of contraction cost, it scales every pairwise contraction cost by d and so yes the path optimization is completely independent.

In some of your equations the index z doesn't appear on all inputs but the statement will probably mostly hold still if the index appears on all the 'expensive' inputs / contractions.

Linux-cpp-lisp commented 3 years ago

@dgasmith thanks for the example — it seems there, though, that you are contracting over the index whose size you are changing.

@jcmgray thanks for confirming this. My quick-and-dirty reasoning suggests that this should also hold for the einsums I showed above:

uvw,ijk,zuvij->zwk
zuvw,ijk,zuvij->zwk

because

Only up to two index sets lack z
When there are two, those two sets do not share any indexes
Thus, they will not be contracted
Thus, the first contraction will always involve a z
Thus, all following contractions will always involve a z
Thus, the cost identically for all paths in z.

Does that seem right? I think this reasoning should hold for any einsum where the batch dimension is on either all but one operands or is on all but two and those two share no indexes.

jcmgray commented 3 years ago

It will hold when z appears in every contraction - like you have found, if the terms that it doesn't appear on (could be many more than 2) don't share any indices, they will (almost) never be contracted together first - either because the path optimizers don't search outer products, or because outer products rarely appear in good contraction paths.

There are some very perverse cases where the optimal path involves an outer product, e.g. I think when u, v, w, i, j, k are all size 1, but I don't think you need to worry about those.

Linux-cpp-lisp commented 3 years ago

That makes sense @jcmgray, thanks for your help!

i,j,k for us is always >1 (we handle 1 in our own special case). I guess in the case where all of those dimensions are 1 almost any contraction will run fast anyway.

dgasmith commented 3 years ago

This appears to be closed, thank you for the discussion!