Closed avibryant closed 3 years ago
You're not the only one who would find this useful, but it's probably more in the realm of "research project" than a quick hack. Most languages that support operator overloading are able to implement this fairly easily by using generic programming, where the functions in question can be run in a mode that produces statement traces instead of their actual result. This is probably not viable for Futhark, since you have less powerful methods of data abstraction, and you cannot overload parallel constructs such as map
at all.
For Futhark, AD would likely involve writing an actual tool that transforms the Futhark AST directly. This is not as crazy as it sounds, as Futhark is a pretty simple language - especially if you start out by compiling away all the modules and polymorphism and such, which is explicitly designed to be possible.
This is indeed a research project, in good part already ongoing. Forward and adjoint modes work by instrumentation and, a priori, sequential computation. The goal is to compile source code to their total derivatives that are massively data parallel. This will require and exploit the purely functional semantics of Futhark. Now, all we need is a bit of research funding to bring it not only to (Haskell) DSLs, but to full Futhark. Use cases with vet high dimensionlity, as in deep learning, are particularly interesting. On Sat, 28 Oct 2017 at 07.54, Troels Henriksen notifications@github.com wrote:
You're not the only one who would find this useful, but it's probably more in the realm of "research project" than a quick hack. Most languages that support operator overloading are able to implement this fairly easily by using generic programming, where the functions in question can be run in a mode that produces statement traces instead of their actual result. This is probably not viable for Futhark, since you have less powerful methods of data abstraction, and you cannot overload parallel constructs such as map at all.
For Futhark, AD would likely involve writing an actual tool that transforms the Futhark AST directly. This is not as crazy as it sounds, as Futhark is a pretty simple language - especially if you start out by compiling away all the modules and polymorphism and such, which is explicitly designed to be possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/diku-dk/futhark/issues/425#issuecomment-340141650, or mute the thread https://github.com/notifications/unsubscribe-auth/AAo8j54g3_qXireOMDF3S6IATngFpZsrks5swsGigaJpZM4QJnyj .
Yes, it's clear that this can't be done at the library level in Futhark - transforming the AST is more what I'm imagining, and as you say, doing it at some stage where everything has been inlined and made concrete. The big cost is that you'd need to keep around a lot of intermediate state that otherwise could be optimized away...
Presumably you'd subsequently pass the generated AST through the ordinary compilation pipeline. For reasons related to now-abandoned ideas about program slicing, the Futhark compiler actually supports very powerful dead code removal (including rewriting expressions whose results are only partially used). I think it would be able to do a decent job on function derivatives.
For interest sake I implemented a module that can do forward mode automatic differentiation using dual numbers https://github.com/cfhammill/futhark-forward-AD. I ran into a lot of problems getting it to work, I'm still trying to figure out which are due to my ignorance and which are potential issues with the futhark compiler. To have the module be really usable I'll need to go back and add significant manual type annotations, which is a bit of a drag.
Sadly forward mode is rarely the one that you want.
That's quite impressive work. If you had asked me before, I would probably tell you that AD through dual numbers could not be implemented in Futhark, but here we are. What problems are you encountering?
All the hard work was done by Daniel Brice, I just mimicked his haskell implementation.
A few pain points:
include
caused surprising breakages. Most of my time writing this was spent debugging here. In the module file you'll notice I copied the definitions of from_prim
and real
into the module type definition. Before I used import real
but then I got errors cannot match d.t to (f64,f4)
in my examples. This could well be me misunderstanding the import system.real
module. At first I thought the removal of recursive functions and a global open would allow me to use the operators unadorned, that turned out to not be the case, but going back and prefixing every single operator was tedious. In Ocaml I think you can do M.{ <expression-with-M-opened> }
, it would have been nice to be able to do that once within each numerical function to control which operators were in scope.set_dual
was out of scope in my example. I found the error pretty cryptic. This is definitely ignorance on my end, but I expected since the type is fixed in the module definition I shouldn't need to annotate types in the module body.real
functions aren't differentiable, it might be worth having a differentiable module that is real
minus floor/ciel/round
etc.I'm a bit curious why you thought it wouldn't be possible, seems to me that the module system is really powerful.
include from_prim with t = (r,r)
. This is definitely a subtle issue, and the error message is atrocious (but this seems a general problem for ML-style modules that I find it difficult to fix).M.(x+y)
syntax from OCaml. Is that what you need? Maybe we forgot to document that.set_dual
and it still compiled. Are you using a slightly older Futhark compiler? Type inference is a very recent addition (in fact, it is not yet part of any released version of Futhark), so it has undergone quite a burst of debugging over the past two weks.linalg
) should not require more operations than they need. For example, linalg
right now requires the argument to be of module type numeric
, but I think it only uses a small subset of the numeric
operations.With respect to inclusion; I was able to use this module type:
{
type r = T.t
type t = (r, r)
val inject: r -> t
val set_dual: t -> r -> t
val get_dual: t -> r
val make_dual: r -> r -> t
include from_prim with t = (r,r)
include numeric with t = (r,r)
include real with t = (r,r)
}
Oh awesome, that will clean up the module signature a fair bit.
Great to know regarding the local module opening, that will hopefully let me clean the code up a bit. I find
(T.acos x, T.negate x' T./ (T.sqrt (T.i32 1 T.- x T.** T.i32 2)))
pretty ugly, although T.acos
is kind of funny
T.( (acos x, negate x' / sqrt (i32 1 - x ** i32 2) )
will be a lot more readable.
I'll try updating the compiler and seeing I can remove those type annotations.
I'll probably clean things up and add some more utility functions like replicate
, but after that I'd be happy to contribute it back to futlib
if you want.
Sure, why not. The more the merrier! At some point we will need to think about package management, but for now it's probably fine to just put stuff into futlib.
We will probably still want to do compiler-based AD at some point, though. Dual numbers are super convenient, but they have their limitations.
Agreed re dual numbers, reverse mode is really what everybody wants anyway. Although forward is trivially parallelizable, the n input scaling is troublesome. I just started trying to wrap my mind around http://conal.net/papers/essence-of-ad/essence-of-ad-extended.pdf. @conal's implementation may lead to an efficient implementation of reverse mode that does not depend on compiling the total derivatives as @henglein mentioned. Given the power of your module system it may be possible to do this in futhark with some minor type system modifications.
@Athas you were totally right, I updated the compiler and the inference errors went away, and with your fix the module definition is much cleaner. I also switched to local module opening which made the implementation easier to read
It's relatively easy to do symbolic differentiation in Haskell using algebraic data types and operator overloading. If Futhark had sum types, then it might be possible in Futhark as well.
That's a different problem. AD is about differentiating functions in a single point (or small neighborhood) that may not be symbolically differentiable.
Hm. I understand AD as being symbolic differentiation (SD) but done at compile time---when everything is symbolic. As such, both AD and classic SD are both about differentiating at a point in the domain, and both require symbolic/analytic/exact differentiability.
I think you're right; the dichotomy I presented is an oversimplification. I guess I don't have a very firm definition that distinguishes AD from classic SD, except that the former can generally handle more complicated expressions/programs. But in any case, the Haskell program linked by @jarble is more like what you'd do in the Futhark compiler (or some other AD tool), and not something that's useful to write in Futhark itself.
Forward or backwards AD is differentiation through the algorithm, SD differentiates through the math expression.
I think a library solution could implement forward or backwards AD (E.g. autograd in Python) but so-called source to source AD would be a compiler trick (E.g. https://github.com/gaika/AutoDiffSource.jl), with an extra AD pass to augment the AST with derivatives.
Swift is adding AD as a core feature which could provide some inspiration
https://github.com/apple/swift/blob/master/docs/DifferentiableProgramming.md
We're actually working on adding AD as a compiler pass (trick) to Futhark right now. (In this case, a Core IR to Core IR transformation.)
Superseded by #1249.
For deep neural networks in particular, and optimization of complex functions in general, it would be very valuable to be able to automatically derive gradients for continuous functions. Doing this (on GPUs) is a big part of what frameworks like TensorFlow or PyTorch give you.
Concretely, what I'd like to see is a way to take any function like
f [n](a: [n]f64): f64
(including for multi-dimensional arrays), and produce af' [n](a: [n]f64): (f64,[n]f64)
, which returns not just value off
but also the gradient of its inputa
with regard to that value. This can be done surprisingly efficiently using reverse-mode autodifferentiation (also referred to as "backprop" in the neural network context).