RFC: add support for batched matmul / einsum / batched tensordot

arogozhnikov commented 6 months ago

Currently I don't see a way to implement einsum or batched matmul using the standard. There is a way with explicit loop, but that does not count.

Some ways this can be addressed:

minimal implementation would include batched_matmul (signature bij, bjk -> bik). Other operations can be reduced to it with reshapes. This may include additional copies
modification of tensordot to include batched dimensions thus should be more efficient, but AFAIK this method is not surfaced in any major framework
include einsum in the standard

leofang commented 6 months ago

matmul already supports batching in the spec. In fact, batching is the number one requirement for linalg routines, so wherever it's applicable we allow it.

Batched tensordot is indeed like you said not implemented in any framework, most likely because there's matmul and einsum in the existing frameworks. Semantically, it's not straightforward to allow batching while keeping the same API style (I think it likely needs an extra kw-only argument to specify the batch dimensions).

I was tasked to propose einsum, as I had strong opinons after the involvement in implementing cuQuantum's counterpart, but unfortunately I lacked of time. I have a very radical version in mind (some NumPy behavior is simply bad IMHO) that has not been implemented in all but a few (cuQuanutm & opt_einsum) libraries. Feel free to share your thoughts or even draft a PR, we'll get it discussed 🙂

arogozhnikov commented 6 months ago

matmul already supports batching in the spec. In fact, batching is the number one requirement for linalg routines, so wherever it's applicable we allow it.

Great, I've missed that

(I think it likely needs an extra kw-only argument to specify the batch dimensions).

yup.

Feel free to share your thoughts

All I ever needed is ellipsis and explicit notation (and I like using spaces anywhere, e.g. 'b ... c, c d -> b ... d ', but that's just a taste). Any standard that requires this minimum done is already perfect for me.

I don't use explicit path specification as framework should be in a better position to guess contraction path (that's not just number of operations - but also how memory-friendly operations become, and how much memory available). Should I need to specify a contraction path I'd prefer a number of einsums instead.

Implicit notation is of questionable value to me - I want to see result, not infer it from pattern. As examples: why ij,jk and nl,lm are two different operations? Why a,a and ...,... are not the same for 1d tensors? etcetc.

Index-based interface is useless - when searching github for usages of it, saw dozens of different CI tests but I found zero production uses.

asmeurer commented 6 months ago

Index-based interface is useless - when searching github for usages of it, saw dozens of different CI tests but I found zero production uses.

The index interface is more useful for instances where you generate an einsum programmatically.

arogozhnikov commented 5 months ago

The index interface is more useful for instances where you generate an einsum programmatically.

Yes, I guess that was in thinking when designed, but this interleaving of indices and tensors is so awkward - it is not friendly to coding (and specially to type hinting).

asmeurer commented 5 months ago

Perhaps the syntax could be improved. In principle an explicit form should be much better for type hinting than a string that is only parsed at runtime.

arogozhnikov commented 5 months ago

In principle an explicit form should be much better for type hinting than a string that is only parsed at runtime.

Why? For type hinting signature einsum(str, *tensor) -> tensor is good enough.

To actually check an operation all of them aren't good: einsum(misplaced_x, x, (1, 2, 4), y, (1, 2, 5, 6, -1), (0, 3, 100, ...)). In strings that's apparent that something is off, but not in indices.

There is some expectation that having tuples makes code more statically checkable, but it is not the case. Strings are more human-checkable.

Should there be index interface for einsum (should?), I'd prefer them being separated in API (einops_str, einops_indices or alike).

leofang commented 5 months ago

I am very eager to respond in length, but unfortunately I am a bit swamped this week. Nevertheless, the discussion so far is very encouraging, as I was thinking

I have a very radical version in mind (some NumPy behavior is simply bad IMHO)

and I am very happy to know I am not alone

The index interface is more useful for instances where you generate an einsum programmatically.

Implicit notation is of questionable value to me - I want to see result, not infer it from pattern.

I'd prefer them being separated in API (einops_str, einops_indices or alike).

Let me turn on my radical side and add why I dislike the einsum string form (and I promise I'll (try to) be more responsive next week). These are two expressions taken from our test suite:

'ÄivrÃEË,JÓIT,ÐOáÙà,Åx,oÒØÏmUÍ,OàW,SgÌCÈ,NgSoJPÌ,vcXz×,HÚGÞ,Æw,ÝD,GsÑi,QÛlÔqzL,IÓqhBdÜ,ÄV,KÎAÇ,uTÖDQÕ,ÆstFÝA,ÛtjÂK,pÔYMbe,FÂkÑÎÁ,rlCVÉÜwaP,ÐÖÀnxU,Þhym,nÏÙßØRÚ,HZÇßk,WÊMË,ÊfÀÍÕLZ,×ÃÈEuRp,ÅjÉáYNB,ÒÁXy->cbefad'

and

abc,def,ghi,jkl,mno,pqr,stu,vwx,yzA,BCD,EFG,HIJ,KLM,NOP,QRS,TUV,WXY,ZÀÁ,ÂÃÄ,ÅÆÇ,ÈÉÊ,ËÌÍ,ÎÏÐ,ÑÒÓ,ÔÕÖ,×ØÙ,ÚÛÜ,ÝÞß,àáâ,ãäå,æçè,éêë,cfìí,íîï,ilîð,ðñò,orñó,óôõ,uxôö,ö÷ø,AD÷ù,ùúû,GJúü,üýþ,MPýÿ,ÿĀā,SVĀĂ,ĂăĄ,YÁăą,ąĆć,ÄÇĆĈ,ĈĉĊ,ÊÍĉċ,ċČč,ÐÓČĎ,ĎďĐ,ÖÙďđ,đĒē,ÜßĒĔ,ĔĕĖ,âåĕė,ėĘę,èëĘĚ,Ěìě,ïòĜĝ,ĝĞğ,õøĞĠ,ĠġĢ,ûþġģ,ģĤĥ,āĄĤĦ,ĦħĨ,ćĊħĩ,ĩĪī,čĐĪĬ,ĬĭĮ,ēĖĭį,įİı,ęěİĲ,ĲĜĳ,ğĢĴĵ,ĵĶķ,ĥĨĶĸ,ĸĹĺ,īĮĹĻ,ĻļĽ,ıĳļľ,ľĴĿ,ķĺŀŁ,ŁłŃ,ĽĿłń,ŅņŇ,ňŉŊ,ŋŌō,ŎŏŐ,őŒœ,ŔŕŖ,ŗŘř,ŚśŜ,ŝŞş,ŠšŢ,ţŤť,ŦŧŨ,ũŪū,ŬŭŮ,ůŰű,ŲųŴ,ŵŶŷ,ŸŹź,ŻżŽ,žſƀ,ƁƂƃ,ƄƅƆ,ƇƈƉ,ƊƋƌ,ƍƎƏ,ƐƑƒ,ƓƔƕ,ƖƗƘ,ƙƚƛ,ƜƝƞ,ƟƠơ,ƢƣƤ,ŇŊƥƦ,ƦƧƨ,ōŐƧƩ,Ʃƪƫ,œŖƪƬ,ƬƭƮ,řŜƭƯ,ƯưƱ,şŢưƲ,ƲƳƴ,ťŨƳƵ,ƵƶƷ,ūŮƶƸ,Ƹƹƺ,űŴƹƻ,ƻƼƽ,ŷźƼƾ,ƾƿǀ,Žƀƿǁ,ǁǂǃ,ƃƆǂǄ,Ǆǅǆ,ƉƌǅǇ,Ǉǈǉ,ƏƒǈǊ,Ǌǋǌ,ƕƘǋǍ,ǍǎǏ,ƛƞǎǐ,ǐǑǒ,ơƤǑǓ,Ǔƥǔ,ƨƫǕǖ,ǖǗǘ,ƮƱǗǙ,ǙǚǛ,ƴƷǚǜ,ǜǝǞ,ƺƽǝǟ,ǟǠǡ,ǀǃǠǢ,ǢǣǤ,ǆǉǣǥ,ǥǦǧ,ǌǏǦǨ,ǨǩǪ,ǒǔǩǫ,ǫǕǬ,ǘǛǭǮ,Ǯǯǰ,ǞǡǯǱ,Ǳǲǳ,ǤǧǲǴ,ǴǵǶ,ǪǬǵǷ,ǷǭǸ,ǰǳǹǺ,ǺǻǼ,ǶǸǻǽ,ǽǹǼ,ńŀŃ,çéƠƢ,äæƝƟ,áãƚƜ,ÞàƗƙ,ÛÝƔƖ,ØÚƑƓ,Õ×ƎƐ,ÒÔƋƍ,ÏÑƈƊ,ÌÎƅƇ,ÉËƂƄ,ÆÈſƁ,ÃÅżž,ÀÂŹŻ,XZŶŸ,UWųŵ,RTŰŲ,OQŭů,LNŪŬ,IKŧũ,FHŤŦ,CEšţ,zBŞŠ,wyśŝ,tvŘŚ,qsŕŗ,npŒŔ,kmŏő,hjŌŎ,egŉŋ,bdņň,êaƣŅ->

For large tensor networks, it is really an unpleasant interface to work with, we've see expressions that can take a whole slide. There're not many unicode characters that can be used, NumPy/CuPy do not support unicode input, and parsing/processing this kind of long expressions is very tedious. When interfacing a C library like cuTensorNet (or even NumPy internal), the index form is a much better choice.

arogozhnikov commented 5 months ago

happy to know I am not alone

😉

We're talking about very different scenarios - I'm trying to cover common use-case: 2 or at most 4 tensors, usually a direct interface for applied scientist.

For large tensor networks, it is really an unpleasant interface to work with

(your argument isn't good - long array of tuples isn't shorter or more readable, API should not target CI as a main scenario). Regardless, I agree if you generate TNs from some other abstraction - strings are not an appropriate interface (unless there is name-based interface for user - in this case sense to propagate names to simplify error handling logic).

I assume you want to model q entanglement, so I see why you want many indices. Side remark: einops allows multi-char names (and names can have digits).

Also question to you: aren't you constrained by "single ellipsis"? Do you use ellipsis at all?

data-apis / array-api

RFC: add support for batched matmul / einsum / batched tensordot #727