Closed jcmgray closed 6 years ago
For expressions that do not have any useful intermediates the greedy
algorithm will keep building outer products leading to a very slow computation. There are a few strategies that we could employ to prevent this such as checking that the cost does not exceed the unoptimized algorithm. So yes, the algorithm can certainly be improved. If you have edge cases where greedy
is missing large speeds up that is found by optimal
they could be very useful to add to the test suite.
OK I see. so here is one (slightly contrived) example where the speedup is significant:
>>> tensors = [np.random.rand(20, 20, 20) for _ in range(4)]
>>> %timeit contract("abc,cfd,dbe,efa", *tensors, path='greedy')
358 ms ± 3.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And with an higher limit memory optimal/greedy approach:
>>> %timeit contract("abc,cfd,dbe,efa", *tensors, path='optimal', memory_limit=-1)
1.07 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This scenario, i.e. where some initial increase of tensor rank is preferable, I think would come up in e.g. 2d tensor network contractions. However, it actually might not be such an issue in the simpler cases I originally thought of.
I couldn't figure out a case where greedy
would blow out these large tensors and would get itself stuck. I do remember finding these previously and worrying about however. So to help alleviate this issue we sieve by both size and cost ensuring that greedy
does not completely go awry. I can now obtain the same path as optimal
for your contraction:
>>> contract_path("abc,cfd,dbe,efa", *tensors, path='greedy', memory_limit=-1)
Complete contraction: abc,cfd,dbe,efa->
Naive scaling: 6
Optimized scaling: 5
Naive FLOP count: 2.560e+08
Optimized FLOP count: 1.282e+07
Theoretical speedup: 19.975
Largest intermediate: 1.600e+05 elements
--------------------------------------------------------------------------------
scaling BLAS current remaining
--------------------------------------------------------------------------------
5 GEMM cfd,abc->abdf dbe,efa,abdf->
5 TDOT abdf,dbe->aef efa,aef->
3 False aef,efa-> ->
Great, thanks for taking a look. I'll close this.
I notice that with
path='greedy'
,memory_limit
is locked at the max input size, for 'algorithmic' reasons. Is this still an issue? / any prospects of this being resolved?I did some brief testing with the limit just removed and all seemed ok (and faster!).
P.S. thanks for the nice project!