Optimal path with constant arguments

dgasmith / opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

https://dgasmith.github.io/opt_einsum/

MIT License

863 stars 68 forks source link

Optimal path with constant arguments #7

Closed shoyer closed 6 years ago

shoyer commented 7 years ago

I have an application where there is a large tensor contraction in the inner loop, with some arrays that are fixed and others that are updated between iterations. Operations only involving the constant terms could done ahead of time (outside the loop), but this may or may not actually improve performance.

It could be nice if opt-einsum had a way to mark some arguments as constant, in which case the cost of contracting them them could neglected when computing the cost of paths only involving these arguments.

dgasmith commented 7 years ago

@shoyer Interesting, can you write up a small example that I can hack on?

shoyer commented 7 years ago

I don't know how interesting an example it is, but here's the one that turned up in our project:

sizes = {'a': 29, 'b': 29, 'c': 4, 'd': 4, 'f': 3, 'i': 3, 'j': 10, 'o': 3, 's': 50}
einsum_string = 'sabj,sfab,ifo,cdj,siac->sobd'

The first three terms are fixed, the last two are vary in each iteration.

dgasmith commented 7 years ago

Sorry it took me a bit to get back to this. If you look at the full contraction we see:

  Complete contraction:  sabj,sfab,ifo,cdj,siac->sobd
         Naive scaling:  9
     Optimized scaling:  6
      Naive FLOP count:  1.453e+09
  Optimized FLOP count:  1.482e+07
   Theoretical speedup:  98.028
  Largest intermediate:  5.046e+05 elements
--------------------------------------------------------------------------------
scaling   BLAS                  current                                remaining
--------------------------------------------------------------------------------
   6      TDOT          siac,ifo->focas                sabj,sfab,cdj,focas->sobd
   6     False        focas,sfab->ocabs                     sabj,cdj,ocabs->sobd
   6     False        ocabs,sabj->ocjbs                          cdj,ocjbs->sobd
   6      TDOT          ocjbs,cdj->sobd                               sobd->sobd

The possible intermediates to form are:

sabj,sfab,ifo->objia (N^7 scaling, 13x cost)
sabj,sfab->abjf (N^7 scaling, 12x cost)
sfab,ifo->sabio (N^6 scaling, 0.9x cost)
sabj,ifo->sabjifo (N^9 scaling, ...)

So it doesn't look quite worth it for this particular example, I think I know how to build something that automatically detects this however.

shoyer commented 7 years ago

So it doesn't look quite worth it for this particular example

Sorry, I should have clarified -- we already did this analysis by hand and came to the same conclusion :). But yes, in general this would be nice to have. I suspect the pattern of updating only some tensors inside an optimization and/or simulation loop is pretty common.

dgasmith commented 7 years ago

Well at least we are coming up with the same thing! Any thoughts on the API if we have a function that looks like the following: constant_contract(expression, constant_tensors, tensors)

where the order in the expression would be constant_tensors + tensors. This could return:

a packet: (new_expression, temporaries) so you could call contract(new_expression, *(temporaries + args))
a function that takes func(tensors)
Something else?

jcmgray commented 6 years ago

I think this is also possible now, to an extent, using dask as the backend with #17 since it will reuse intermediaries if the arrays are kept the same.

dgasmith commented 6 years ago

Good point, I wonder if it would be worth it to revisit this in a more explicit manner however. If I think about the ML libraries it would be much better if we could reuse parts that would stay on a GPU.

jcmgray commented 6 years ago

Indeed. I have a bit of my library that can view any tensor network as a scipy.LinearOperator, allowing for various nice iterative decompositions just involving matvec. Obviously between matvec calls the tensors are all constant apart from the new vector. I think the contractions generally start with the vector and contract everything into it (and so there are no reusables), but it would be nice to automatically detect when this is not the case.

My suggestion for an api would be:

>>> # assume we have four shapes and matching arrays with 2 & 4 constant
>>> expr = contract_expression("abc,bcd,cde,ef", shape1, array2, shape3, array4)
>>> out = expr(array1, array3)

I.e. if you supply arrays to a contract_expression it internalizes them, and then to the expression itself you only supply arrays matching the remaining shapes.

EDIT: I think this api would be pretty easy to implement, but short of just using dask internally, working out the reusables would be a bit more involved.

dgasmith commented 6 years ago

What if someone does contract_expression("i,i->", (1, 2, 3), np.array([1, 2, 3])) expecting to take an inner product? I think I was stumped for a clean API last time this came up.

jcmgray commented 6 years ago

Ah yes. I think you'd have to explicitly enforce a 'if it doesn't have a shape, it's not an array' policy. Producing helpful errors to that effect would be possible since shapes are only 1D and wouldn't match the expression if viewed as arrays but I see your point.

Another option would just be:

contract_expression("abc,bcd,cde,ef", shape0, array1, shape2, array3, constant=[1, 3])

dgasmith commented 6 years ago

I could be more comfortable with that. shape0/shape2 could also be arrays with shape attributes as well.