dgasmith / opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.
https://dgasmith.github.io/opt_einsum/
MIT License
822 stars 67 forks source link

memory_limit with 'greedy' contractions #8

Closed jcmgray closed 6 years ago

jcmgray commented 6 years ago

I notice that with path='greedy', memory_limit is locked at the max input size, for 'algorithmic' reasons. Is this still an issue? / any prospects of this being resolved?

I did some brief testing with the limit just removed and all seemed ok (and faster!).

P.S. thanks for the nice project!

dgasmith commented 6 years ago

For expressions that do not have any useful intermediates the greedy algorithm will keep building outer products leading to a very slow computation. There are a few strategies that we could employ to prevent this such as checking that the cost does not exceed the unoptimized algorithm. So yes, the algorithm can certainly be improved. If you have edge cases where greedy is missing large speeds up that is found by optimal they could be very useful to add to the test suite.

jcmgray commented 6 years ago

OK I see. so here is one (slightly contrived) example where the speedup is significant:

>>> tensors = [np.random.rand(20, 20, 20) for _ in range(4)]
>>> %timeit contract("abc,cfd,dbe,efa", *tensors, path='greedy')
358 ms ± 3.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And with an higher limit memory optimal/greedy approach:

>>> %timeit contract("abc,cfd,dbe,efa", *tensors, path='optimal', memory_limit=-1)
1.07 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This scenario, i.e. where some initial increase of tensor rank is preferable, I think would come up in e.g. 2d tensor network contractions. However, it actually might not be such an issue in the simpler cases I originally thought of.

dgasmith commented 6 years ago

I couldn't figure out a case where greedy would blow out these large tensors and would get itself stuck. I do remember finding these previously and worrying about however. So to help alleviate this issue we sieve by both size and cost ensuring that greedy does not completely go awry. I can now obtain the same path as optimal for your contraction:

>>> contract_path("abc,cfd,dbe,efa", *tensors, path='greedy', memory_limit=-1)
  Complete contraction:  abc,cfd,dbe,efa->
         Naive scaling:  6
     Optimized scaling:  5
      Naive FLOP count:  2.560e+08
  Optimized FLOP count:  1.282e+07
   Theoretical speedup:  19.975
  Largest intermediate:  1.600e+05 elements
--------------------------------------------------------------------------------
scaling   BLAS                  current                                remaining
--------------------------------------------------------------------------------
   5      GEMM            cfd,abc->abdf                           dbe,efa,abdf->
   5      TDOT            abdf,dbe->aef                                efa,aef->
   3     False                aef,efa->                                       ->
jcmgray commented 6 years ago

Great, thanks for taking a look. I'll close this.