Support more subscript symbols

dgasmith / opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

https://dgasmith.github.io/opt_einsum/

MIT License

858 stars 68 forks source link

Support more subscript symbols #67

Closed liwt31 closed 6 years ago

liwt31 commented 6 years ago

The project now supports more than 52 subscripts, but only with integers, and it can quickly get confusing using so many integers. My suggestion is to support this kind of syntax:

arr1 = np.zeros((2, 3))
sub1 = '[left edge][bond]'
arr2 = np.zeros((3, 2))
sub2 = '[bond][right edge]'
out = '[left edge][right edge]'
opt_einsum.contract(arr1, sub1, arr2, sub2, out)

I know that it's equivalent with:

arr1 = np.zeros((2, 3))
sub1 = [0, 1]
arr2 = np.zeros((3, 2))
sub2 = [1, 2]
out = [0, 2]

However, IMHO, when doing contraction with lots of tensor which have different meanings, it'll be nice to give them a proper name.

I'm more than happy to prepare a PR for this, I think it's not extemely difficult.

jcmgray commented 6 years ago

It may not be as readable as actual words but just thought I'd note thatopt_einsum does support more than 52 subscripts in non-integer mode via unicode symbols. One way to access these conveniently is via:

>>> from opt_einsum import get_symbol
>>> get_symbol(100)
'ð'

On the other hand a syntax like you suggest:

contract(arr1, ['left edge', 'bond'], arr2, ['bond', 'right edge'])

I can definitely see the appeal of!

Also, you might be interested in (my project) quimb, which while notionally for quantum calculations, has very generic support for labelled tensors and tensor networks, backed by opt_einsum:

>>> import quimb.tensor as qtn
>>> X = qtn.rand_tensor((2, 3), inds=['left edge', 'bond'], tags={'X'})
>>> Y = qtn.rand_tensor((3, 2), inds=['bond', 'right edge'], tags={'Y'})
>>> X @ Y
Tensor(shape=(2, 2), inds=('left edge', 'right edge'), tags={'Y', 'X'})

liwt31 commented 6 years ago

Thank you for the clarification and I am deeply impressed by the quimb.tensor module. Have you ever thought about separating it as an individual package? I think it has a much broader application than in quimb or even quantum problems. By the way, I reckon the syntax you are suggesting should be

contract(arr1, ['left edge', 'bond'], arr2, ['bond', 'right edge'], ['left edge', 'right edge'])

jcmgray commented 6 years ago

Yes that would be how to specify the output indices - for other contraction syntaxes this doesn't need to specified, its just the alphabetic order of any uncontracted (single) indices, I imagine alphabetic- or input-order would be a good default here, as well.

I am deeply impressed by the quimb.tensor module. Have you ever thought about separating it as an individual package?

Thanks! Yes it occurred to me but I haven't had the time or need to decouple it quite yet..

dgasmith commented 6 years ago

@jcmgray Should we consider moving the edge/bond syntax into opt_einsum or leave it elsewhere? I would be a bit concerned that a third input form on contact could become unwieldy as we have the above mentioned integer input as well.

jcmgray commented 6 years ago

Yes I am in two minds too a bit. One thing that is possible is just to broaden the current case to anything hashable (which includes ints) so it would not be two separate input forms. Here is useful snippet that could replace the current logic:

>>> symbols = (oe.get_symbol(i) for i in itertools.count())
>>> symbol_map= defaultdict(lambda: next(symbols))
>>> symbol_map['left'] 
'a'
>>> symbol_map['right']
'b'
>>> symbol_map['left']
'a'

then you could replace the lines here and here

subscripts += get_symbol(s)

just with

subscripts += symbol_map[s]

at basically no cost.

dgasmith commented 6 years ago

Closed by #68.