Mikolaj / horde-ad

Higher Order Reverse Derivatives Efficiently - Automatic Differentiation library based on the paper "Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"
BSD 3-Clause "New" or "Revised" License
33 stars 6 forks source link

Future work: vectorize using rugged tensors (irregular arrays) and/or Blelloch's flattening #79

Open Mikolaj opened 1 year ago

Mikolaj commented 1 year ago

Once vectorization on strictly regular array expressions works (these are expressions that result in regular arrays even after vectorization), generalize it to irregular arrays. Optionally, at the end, check regularity of the result and cast it back as regular arrays (which is close to what Futhark does, sometimes incurring a proof obligation). Alternatively, flatten the irregular tensors early or late and so get regular tensors of lower dimensions (but try not to flatten everything down to vectors, or somehow add a portion of the dimensions back, if possible).

This is needed to expand the language we are able to support (more of the shape-influcencing Ints can be dynamic expressions), Currently, due to emergent irregularity, we need to reject some expressions even in the restricted language (or just not vectorize them, e.g., building tensors of delta expressions instead).

Mikolaj commented 1 year ago

BTW, a way of telling this story is this: we decided to vectorize vector programs and it turned out we need regular tensors to express some results of the vectorization. So we added tensors to the language. In turn, vectorizing those (I'm not sure if proper tensors are needed or if vectors suffice, but the story is better in the former case), we discovered we need irregular (rugged) tensors to express what the vectorization produces.

Now the question arises: if we vectorize irregular tensors (or, possibly, recursively vectorize any of the previous classes to the same effect), are all the results legal rugged tensors or do we need an ever broader class? What class could it even be?

The other question, posed already in the description of the ticket, is whether going back to the simpler classes is feasible and what are the trade-offs involved (it's always possible to go back by full flattening, but then all of the structure is lost, impacting type-safety, performance and possibly more). Another question is how to restrict the relevant classes in order to make them closed under vectorization.

Mikolaj commented 1 year ago

BTW, I've just learned from https://youtu.be/FzfofFPOGOs?t=815 that Accelerate does not have nested arrays (which means rugged tensors, I think), citing hardware limitations, but they have segmented arrays, which probably are rugged tensors represented in the flat way Blelloch proposed. So not only DPH had that representation years ago, but it's found its way to this excellent and modern library.

This may mean rugged tensors are not practical without this flat representation and probably the representation would not be hidden for some practical reasons.

Mikolaj commented 1 year ago

I think, before this ticket, we should attempt recasting the vectorized AD language in Data.Array.Shaped. This would involve writing lots of hard dependently typed code, but the trade-offs are likely to be better than either fully dynamic or fully static assignment of shapes (these would be static, but parameterized on type variables). This is going to be hard, especially without a good library (including low dependency footprint) for sized lists indexed by Nat (I don't know of any).

tomsmeding commented 1 year ago

(I randomly found this issue in the list)

Accelerate does not have nested arrays (which means rugged tensors, I think), citing hardware limitations, but they have segmented arrays, which probably are rugged tensors represented in the flat way Blelloch proposed. So not only DPH had that representation years ago, but it's found its way to this excellent and modern library.

Accelerate has a collection of segmented reductions and segmented scans as primitives, indeed. However, that's the current extent of support of flattening in Accelerate -- there is no automatic flattening of any code. DPH was, if I've understood correctly, much more ambitious in that it actively flattened all the things so the user could be oblivious to the problems of irregular arrays.

I think, before this ticket, we should attempt recasting the vectorized AD language in Data.Array.Shaped.

If I understand correctly, this would mean that the user of horde-ad would also have to write said dependently-typed code, right? Futhark kind of does this, but they have the advantage of being able to tweak the type system to be precisely as they want it. I'd be interested to see how the user experience of such an API in Haskell would be.

Mikolaj commented 1 year ago

Accelerate

Interesting. I fear the flattening resulting in segments is a horror both for the tool writer and user, because it can't be fully abstracted for whatever reasons (probably performance). I wonder if there are any really transparent solutions. Maybe for sparse arrays the rugged aspect can be virtually free and covered with the same machinery as sparsity.

If I understand correctly, this would mean that the user of horde-ad would also have to write said dependently-typed code, right? Futhark kind of does this, but they have the advantage of being able to tweak the type system to be precisely as they want it. I'd be interested to see how the user experience of such an API in Haskell would be.

It's not that bad. See the complete neural networks (with variants) coded here

https://github.com/Mikolaj/horde-ad/blob/612e0c6a06f7981c26ca96693ff905a72f4d9228/README.md

https://github.com/Mikolaj/horde-ad/blob/612e0c6a06f7981c26ca96693ff905a72f4d9228/example/MnistFcnnShaped.hs

https://github.com/Mikolaj/horde-ad/blob/612e0c6a06f7981c26ca96693ff905a72f4d9228/example/MnistRnnShaped.hs

And the current iteration of horde-ad interface has a chance of being more regular and pleasant, if we manage to port it to shaped tensors (I'm certainly thankful we didn't start this with shaped tensors; I'd go mad).

In the worst case you'd do unsafeCoerce (which IIRC, I avoided in these examples at the cost of passing many proxy arguments, which however serve as a more explicit and self-documenting interface and make the errors better).

tomsmeding commented 1 year ago

I fear the flattening resulting in segments is a horror both for the tool writer and user, because it can't be fully abstracted for whatever reasons (probably performance).

That's the reason I'm aware of too, yes. I believe flattening "just works", but gives you pretty bad constant factors if you just apply it naively always.

It's not that bad. See the complete neural networks (with variants) coded here

Interesting. It does seem that apart from having to encode the full shapes in types (which goes without saying), you don't have to do much magic to make it work. Perhaps that changes when the size changes are beyond the powers of ghc-typelits-natnormalise.

In the worst case you'd do unsafeCoerce (which IIRC, I avoided in these examples at the cost of passing many proxy arguments, which however serve as a more explicit and self-documenting interface and make the errors better).

What happens if the user uses unsafeCoerce in their code to make the shapes match up? Horde will then receive a program that has inconsistent typing, in some sense. Will anything break? I feel like that's very dangerous territory -- using unsafeCoerce when calling a library that you know doesn't actually do anything with the types (like orthotope) is "fine" as long as you have your external proof that the coerced shapes are indeed equal. But if the library (in this case, horde-ad) uses the types internally, I wonder if there are subtle ways in which everything can collapse.

Mikolaj commented 1 year ago

What happens if the user uses unsafeCoerce in their code to make the shapes match up? Horde will then receive a program that has inconsistent typing, in some sense.

That's a good point. Perhaps a safer way is to convert to completely unshaped tensors and back (there is an untyped "rank" in both standard horde-ad and even, in minimal form, in the simplified one), which I do in a couple of places. Or flatten and reshape back. Both methods runtime-fail if the number of elements doesn't match and nothing else is required. Such "believe me" terms are fully supported by horde-ad, though see another ticket which lists AstReshape as the hardest nut to crack for vectorization. :)

If, OTOH, horde-ad had differentiable higher-order functions, or anything else that can't be to summarized to "has X elements", then we might be in trouble.