infinite loop in dynamic-programming with 'max_input' memory limit

dgasmith / opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

https://dgasmith.github.io/opt_einsum/

MIT License

848 stars 68 forks source link

infinite loop in dynamic-programming with 'max_input' memory limit #153

Closed rc closed 3 years ago

rc commented 3 years ago

Hi,

first, thank you for your excellent package!

Lately I have been playing with various optimization methods and stumbled on a problem demonstrated by the following snippet:

import numpy as np
import opt_einsum as oe

a1 = np.ones((1, 4))
a2 = np.ones((1, 4, 3, 4))
a3 = np.ones((3, 3, 6))
a4 = np.ones((1, 4, 6, 6))

# OK
path, path_info = oe.contract_path('cq,cqje,rji,cqik,cqlf,slk->cresf',
                                   a1, a2, a3, a4, a2, a3,
                                   optimize='dynamic-programming')
print(path)
print(path_info)

# Infinite while loop in opt_einsum/paths.py, line 1014.
path, path_info = oe.contract_path('cq,cqje,rji,cqik,cqlf,slk->cresf',
                                   a1, a2, a3, a4, a2, a3,
                                   optimize='dynamic-programming',
                                   memory_limit='max_input')
print(path)
print(path_info)

The 'dynamic-programming' optimization loops "infinitely" when the 'max_input' memory limit (or other limit that cannot be satisfied when doing pair-wise contractions) is given.

Is it possible to modify DynamicProgramming.__call__() to detect and report such cases? My idea was to compare x to previous ones to detect stagnation, but cost_cap increase complicates things. Maybe impose a limit to max. cost_cap? I am willing to give it a try, if somebody knowledgeable of DynamicProgramming confirms that limiting cost_cap would be feasible.

jcmgray commented 3 years ago

I would have thought checking something about the lengths in x would be sufficient to catch this.

Out of interest what's your use-case?

My thinking these days is that generally its best to not put the restriction on memory within the path finding, especially one as strict as 'max_input'. Some other options:

use e.g. optimize=oe.DynamicProgramming(minimize='size') and check the size afterwards
use slicing on whatever the final pairwise path is (#95, #125) - sadly I haven't got round to implementing this in opt_einsum but this is really the best way of performing contractions with a memory limit, see e.g. here!

rc commented 3 years ago

Thanks for the suggestions, I "solved" my problem by removing the restriction - I was just testing what the memory limit did :) I am evaluating einsum-based calculations of finite element matrices to be included in sfepy.

Also thank you for mentioning the slicing, I was going to ask about it... For the moment I am doing that manually using an outer for loop in Python, when needed. In the finite element evaluations, it is often preferable to proceed element-by-element (but not always!).

As for this issue, the main problem was that the optimization did not fail, so timing tests never finished (before removing the memory limit). It took me some time to discover the reason (was it really cycling infinitely, or just taking a long time?), so having an exception thrown quickly would be nice. I was looking at the items in x, but the fact that those do not change is not enough - they might change in future as cost_cap increases. That is why I thought about detecting a too large cost_cap (it gets very quickly to an integer several screens long).

jcmgray commented 3 years ago

Also thank you for mentioning the slicing, I was going to ask about it... For the moment I am doing that manually using an outer for loop in Python, when needed. In the finite element evaluations, it is often preferable to proceed element-by-element (but not always!).

Ah yes, well cotengra now has some pretty powerful methods currently for finding the best sliced indices + contraction path if you wanted to explore more.

As for this issue, the main problem was that the optimization did not fail, so timing tests never finished (before removing the memory limit). It took me some time to discover the reason (was it really cycling infinitely, or just taking a long time?), so having an exception thrown quickly would be nice. I was looking at the items in x, but the fact that those do not change is not enough - they might change in future as cost_cap increases. That is why I thought about detecting a too large cost_cap (it gets very quickly to an integer several screens long).

And agreed the silent failure is an issue, and I see your point about the cost_cap increases.

When minimize='size' the cost_cap really is the max size, so that is an easy comparison to make.
When 'minimize='flops' the cost_cap is the number of operations, which is probably bounded by the max_size in some way... I.e., for N inputs with a given max intermediate size M, I believe the most expensive contraction would be equivalent to N - 1 matrix multiplications each of 'side length' M**1/2, so that the total cost is (N - 1) * M ** (3 / 2). If the cost_cap goes above this then one can probably stop. Might be worth double checking that logic however?

rc commented 3 years ago

Also thank you for mentioning the slicing, I was going to ask about it... For the moment I am doing that manually using an outer for loop in Python, when needed. In the finite element evaluations, it is often preferable to proceed element-by-element (but not always!).

Ah yes, well cotengra now has some pretty powerful methods currently for finding the best sliced indices + contraction path if you wanted to explore more.

Nice! cotengra looks very powerful, but my "networks" are too simple - it is IMO always the best to slice by the cell index ('c' in the snippet above) - cotengra agreed :).

As for this issue, the main problem was that the optimization did not fail, so timing tests never finished (before removing the memory limit). It took me some time to discover the reason (was it really cycling infinitely, or just taking a long time?), so having an exception thrown quickly would be nice. I was looking at the items in x, but the fact that those do not change is not enough - they might change in future as cost_cap increases. That is why I thought about detecting a too large cost_cap (it gets very quickly to an integer several screens long).

And agreed the silent failure is an issue, and I see your point about the cost_cap increases.
* When `minimize='size'` the `cost_cap` really is the max size, so that is an easy comparison to make.

* When `'minimize='flops'` the `cost_cap` is the number of operations, which is probably bounded by the max_size in some way... I.e., for `N` inputs with a given max intermediate size `M`, I believe the most expensive contraction would be equivalent to `N - 1` matrix multiplications each of 'side length' `M**1/2`, so that the total cost is `(N - 1) * M ** (3 / 2)`. If the `cost_cap` goes above this then one can probably stop. Might be worth double checking that logic however?

So M is max_size? I do not follow the logic, could you elaborate what do you mean by 'side length'? Is stopping at the "naive" flops count too late?

jcmgray commented 3 years ago

Oh yes naive flops is probably fine. I'm happy to put a quick PR together unless you want to @rc .

Just for interest to be a bit clearer about the tighter bound I suggested, if you consider the set of all contractions with N inputs and a max size intermediate M, the question is, which contraction has the highest cost C? We can use this to stop the search once cost_cap raises above it.

Then following the logic:

Every contraction is pairwise
The cost of each is the product of all the dimensions involved
With the constraint of size M, (I think) any contraction of half the indices each of two tensors of size M giving another matrix of size M maximizes this cost
An example of that is taking N matrices of shape (M**0.5, M**0.5) and multiplying them all together.

Suggesting that the maximum cost for any contraction of N inputs with max intermediate M is (N - 1) * M**(3 / 2), which is likely quite a bit lower than the naive cost M**(N + 1).

rc commented 3 years ago

Thanks for the clarification! That would be great if you find time to make the PR - I do not know the code-base much, so while I am willing to give it a shot, having it done by an expert is better.

rc commented 3 years ago

Thanks for the fix!