Open Mikolaj opened 2 years ago
The best name imo is matMul
. One sometimes sees groupedMatMul
, but it's not needed.
A sensible semantics is in the link from https://github.com/Mikolaj/horde-ad/issues/64#issuecomment-1323655853
This should be doable in some form. Either giving it four type-level lists as in the Tensorflow function or, orthotope-style, work on the outermost dimensions, transposing as needed, which is almost free in orthotope. However, I'm not sure how to handle mutliple dimensions to contract and/or batch in the latter API. That might require nested orthotope arrays, which is doable via Data.Array.Shaped
, but being boxed, this can be slower than specifying lists of dimensions. Transforming between unboxed and nested boxed is a combination of transpose and ravel/unravel, so it's more noisy than just transpose. Another option is being less general than Tensorflow, with the user having an option to manually ravel/unravel to recover the generality.
This is a continuation of https://github.com/Mikolaj/horde-ad/issues/64#issuecomment-1250029728.
The idea is to implement
where the extra dimensions in
sh
behave as in mini-batches, that is, ordinary matrix multiplication is performed for each array contained within the extra dimensions and the results are embedded in the extra dimensions analogously. E.g., if we have one extra dimension of size 3, then matrix multiplication would be performed three times and we'd get a tensor corresponding to a 3-element vector the resulting matrices. We'd need @awf to confirm this is the generalization (and the name) that makes the most sense.We already have
which should be generalized (and
(<>$)
most probably used in the generalization, though the(<>!)
operation from hmatrix may be used directly as well).Sadly, an elegant solution with the
m
argument enlarged usingbroadcast
(see https://hackage.haskell.org/package/orthotope/docs/Data-Array-ShapedS.html) and then, after some uses oftranspose
, matrix multiplication performed in all inner matrices usingrerank2
, doesn't work. That's becausererank2
requires both its argument tensors to be of exactly equal shape, while arguments to matrix multiplication don't need to have an equal shape. Generalizingrerank2
seems hard, given that we don't even have a dual number counterpart ofrerank2
yet (to be done, if at all possible, in #28).An plausible solution is to turn the
t
tensor into a list of lists of lists of matrices (theunravel
operation), perform the multiplications and roll the lists back into a tensor (theravel
operation). This is going to be tricky to type, because the length ofsh
is arbitrary, but at least we already have dual number counterparts ofunravel
andravel
.