diku-dk / futhark

:boom::computer::boom: A data-parallel functional programming language
http://futhark-lang.org
ISC License
2.4k stars 167 forks source link

auto-differentiation #425

Closed avibryant closed 3 years ago

avibryant commented 7 years ago

For deep neural networks in particular, and optimization of complex functions in general, it would be very valuable to be able to automatically derive gradients for continuous functions. Doing this (on GPUs) is a big part of what frameworks like TensorFlow or PyTorch give you.

Concretely, what I'd like to see is a way to take any function like f [n](a: [n]f64): f64 (including for multi-dimensional arrays), and produce a f' [n](a: [n]f64): (f64,[n]f64), which returns not just value of f but also the gradient of its input a with regard to that value. This can be done surprisingly efficiently using reverse-mode autodifferentiation (also referred to as "backprop" in the neural network context).

athas commented 7 years ago

You're not the only one who would find this useful, but it's probably more in the realm of "research project" than a quick hack. Most languages that support operator overloading are able to implement this fairly easily by using generic programming, where the functions in question can be run in a mode that produces statement traces instead of their actual result. This is probably not viable for Futhark, since you have less powerful methods of data abstraction, and you cannot overload parallel constructs such as map at all.

For Futhark, AD would likely involve writing an actual tool that transforms the Futhark AST directly. This is not as crazy as it sounds, as Futhark is a pretty simple language - especially if you start out by compiling away all the modules and polymorphism and such, which is explicitly designed to be possible.

henglein commented 7 years ago

This is indeed a research project, in good part already ongoing. Forward and adjoint modes work by instrumentation and, a priori, sequential computation. The goal is to compile source code to their total derivatives that are massively data parallel. This will require and exploit the purely functional semantics of Futhark. Now, all we need is a bit of research funding to bring it not only to (Haskell) DSLs, but to full Futhark. Use cases with vet high dimensionlity, as in deep learning, are particularly interesting. On Sat, 28 Oct 2017 at 07.54, Troels Henriksen notifications@github.com wrote:

You're not the only one who would find this useful, but it's probably more in the realm of "research project" than a quick hack. Most languages that support operator overloading are able to implement this fairly easily by using generic programming, where the functions in question can be run in a mode that produces statement traces instead of their actual result. This is probably not viable for Futhark, since you have less powerful methods of data abstraction, and you cannot overload parallel constructs such as map at all.

For Futhark, AD would likely involve writing an actual tool that transforms the Futhark AST directly. This is not as crazy as it sounds, as Futhark is a pretty simple language - especially if you start out by compiling away all the modules and polymorphism and such, which is explicitly designed to be possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/diku-dk/futhark/issues/425#issuecomment-340141650, or mute the thread https://github.com/notifications/unsubscribe-auth/AAo8j54g3_qXireOMDF3S6IATngFpZsrks5swsGigaJpZM4QJnyj .

avibryant commented 7 years ago

Yes, it's clear that this can't be done at the library level in Futhark - transforming the AST is more what I'm imagining, and as you say, doing it at some stage where everything has been inlined and made concrete. The big cost is that you'd need to keep around a lot of intermediate state that otherwise could be optimized away...

athas commented 7 years ago

Presumably you'd subsequently pass the generated AST through the ordinary compilation pipeline. For reasons related to now-abandoned ideas about program slicing, the Futhark compiler actually supports very powerful dead code removal (including rewriting expressions whose results are only partially used). I think it would be able to do a decent job on function derivatives.

cfhammill commented 6 years ago

For interest sake I implemented a module that can do forward mode automatic differentiation using dual numbers https://github.com/cfhammill/futhark-forward-AD. I ran into a lot of problems getting it to work, I'm still trying to figure out which are due to my ignorance and which are potential issues with the futhark compiler. To have the module be really usable I'll need to go back and add significant manual type annotations, which is a bit of a drag.

Sadly forward mode is rarely the one that you want.

athas commented 6 years ago

That's quite impressive work. If you had asked me before, I would probably tell you that AD through dual numbers could not be implemented in Futhark, but here we are. What problems are you encountering?

cfhammill commented 6 years ago

All the hard work was done by Daniel Brice, I just mimicked his haskell implementation.

A few pain points:

I'm a bit curious why you thought it wouldn't be possible, seems to me that the module system is really powerful.

athas commented 6 years ago
athas commented 6 years ago

With respect to inclusion; I was able to use this module type:

{
  type r = T.t
  type t = (r, r)
  val inject: r -> t
  val set_dual: t -> r -> t
  val get_dual: t -> r
  val make_dual: r -> r -> t

  include from_prim with t = (r,r)
  include numeric with t = (r,r)
  include real with t = (r,r)
}
cfhammill commented 6 years ago

Oh awesome, that will clean up the module signature a fair bit.

Great to know regarding the local module opening, that will hopefully let me clean the code up a bit. I find

(T.acos x, T.negate x' T./ (T.sqrt (T.i32 1 T.- x T.** T.i32 2))) 

pretty ugly, although T.acos is kind of funny

T.( (acos x, negate x' / sqrt (i32 1 - x ** i32 2) )

will be a lot more readable.

I'll try updating the compiler and seeing I can remove those type annotations.

I'll probably clean things up and add some more utility functions like replicate, but after that I'd be happy to contribute it back to futlib if you want.

athas commented 6 years ago

Sure, why not. The more the merrier! At some point we will need to think about package management, but for now it's probably fine to just put stuff into futlib.

We will probably still want to do compiler-based AD at some point, though. Dual numbers are super convenient, but they have their limitations.

cfhammill commented 6 years ago

Agreed re dual numbers, reverse mode is really what everybody wants anyway. Although forward is trivially parallelizable, the n input scaling is troublesome. I just started trying to wrap my mind around http://conal.net/papers/essence-of-ad/essence-of-ad-extended.pdf. @conal's implementation may lead to an efficient implementation of reverse mode that does not depend on compiling the total derivatives as @henglein mentioned. Given the power of your module system it may be possible to do this in futhark with some minor type system modifications.

cfhammill commented 6 years ago

@Athas you were totally right, I updated the compiler and the inference errors went away, and with your fix the module definition is much cleaner. I also switched to local module opening which made the implementation easier to read

jarble commented 5 years ago

It's relatively easy to do symbolic differentiation in Haskell using algebraic data types and operator overloading. If Futhark had sum types, then it might be possible in Futhark as well.

athas commented 5 years ago

That's a different problem. AD is about differentiating functions in a single point (or small neighborhood) that may not be symbolically differentiable.

conal commented 5 years ago

Hm. I understand AD as being symbolic differentiation (SD) but done at compile time---when everything is symbolic. As such, both AD and classic SD are both about differentiating at a point in the domain, and both require symbolic/analytic/exact differentiability.

athas commented 5 years ago

I think you're right; the dichotomy I presented is an oversimplification. I guess I don't have a very firm definition that distinguishes AD from classic SD, except that the former can generally handle more complicated expressions/programs. But in any case, the Haskell program linked by @jarble is more like what you'd do in the Futhark compiler (or some other AD tool), and not something that's useful to write in Futhark itself.

maedoc commented 4 years ago

Forward or backwards AD is differentiation through the algorithm, SD differentiates through the math expression.

I think a library solution could implement forward or backwards AD (E.g. autograd in Python) but so-called source to source AD would be a compiler trick (E.g. https://github.com/gaika/AutoDiffSource.jl), with an extra AD pass to augment the AST with derivatives.

Swift is adding AD as a core feature which could provide some inspiration

https://github.com/apple/swift/blob/master/docs/DifferentiableProgramming.md

zfnmxt commented 4 years ago

We're actually working on adding AD as a compiler pass (trick) to Futhark right now. (In this case, a Core IR to Core IR transformation.)

athas commented 3 years ago

Superseded by #1249.