dgasmith / opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.
https://dgasmith.github.io/opt_einsum/
MIT License
863 stars 68 forks source link

DP tweaks #105

Closed dgasmith closed 5 years ago

dgasmith commented 5 years ago

Description

Adds a quick description of the dp algorithm, updates auto, and refreshes the timing images thanks to @jcmgray.

@mrader1248 Please do write up a small paragraph on the approach and similar to greedy/branch when you get the chance. Also feel free to rewrite the short description.

Status

codecov[bot] commented 5 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@4c123e0). Click here to learn what that means. The diff coverage is 100%.

codecov[bot] commented 5 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@4c123e0). Click here to learn what that means. The diff coverage is 100%.

jcmgray commented 5 years ago

Here are up-to-date versions of the graphs:

oe-times-graph

oe-flops-graph

I've put the usual 'random-greedy' on (32 repeats) as this is the pre-defined one. We might potentially want to leave adding 'dp' to 'auto' for the moment as it's a bit slower for these small sizes with the current cost limit method.

dgasmith commented 5 years ago

@jcmgray Thanks for the new charts! I have reverted the auto change, but I don't quite understand why the scaling is different here then in the previous charts in #102.

mrader1248 commented 5 years ago

Just a small remark: Maybe you should not only take the time for finding a contraction path into account, but also the possible speedup, when you make decisions for the auto path, as usually much more time is spent for the actual contractions. (I really don't want to be pushy in favour of my own optimiser, but want to make a more general remark.)

jcmgray commented 5 years ago

I agree that this kind of thing would be very nice, but it becomes quite complex to implement - a 10x theoretical speedup for small sized contractions might not practically appear, whereas for large contractions a 1.1x speedup could be very helpful. I think the idea for the 'auto' mode was to target < 1ms if possible, so as to depart as little as possible in terms of constant overhead from numpy.einsum (since many people are not doing large or complex contractions, but instead many small simple ones).

One way would be to keep track of the estimated contraction time (from the other thread I think I found it was about 10^-7.5 the estimated cost - but with huge variance), and say "I'm happy to spend X% of this estimated time path finding", but then you need an estimate of this cost and how long 'dp' will take etc etc.

My preference in terms of simplicity would be to add a auto-hq mode that uses 'dp' and then switches over to 'random-greedy', where that 1ms target is more like 1s.