compiling-to-categories / concat

Compiling to Categories
http://conal.net/papers/compiling-to-categories
BSD 3-Clause "New" or "Revised" License
431 stars 49 forks source link

Documentation on Running Simple Program using AD #24

Open barak opened 6 years ago

barak commented 6 years ago

I'm trying to play with the AD facilities, so decided to start with a minimal example.

import ConCat.ADFun

d :: (Double -> a) -> Double -> a
d f x = derF f x 1

main = do
  print $ d id 0                     -- should be 1
  print $ d (\x -> x*(d (x*) 2)) 1   -- should be 2

Placing this as nest.hs in the top-level concat directory and running stack ghc nest.hs generates nest but

$ ./nest
nest: Oops: toCcc' called

I tried to turn on options etc, see below, to get this migrated to compile-time, but must not have tickled the right ones. This is a beautiful system, and I'd like to play with it using the nice exported API without delving down into the internals. A few tiny examples of how to do this would really help.

{-# OPTIONS_GHC -fplugin=ConCat.Plugin #-}
{-# OPTIONS_GHC -Wall #-}
{-# OPTIONS_GHC -O2 #-}

import ConCat.ADFun

-- Define a restricted form of the d operator.
d :: (Double -> a) -> Double -> a
d f x = derF f x 1
{-# INLINE d #-}

unnested :: Double
unnested = d id 0               -- should be 1
{-# INLINE unnested #-}

nested :: Double
nested = d (\x -> x*(d (x*) 2)) 1 -- should be 2
{-# INLINE nested #-}

main :: IO ()
main = do
  putStr "unnested, should be 1: "
  print unnested
  putStr "nested, should be 2: "
  print nested
cfhammill commented 6 years ago

@barak I tried your example as well, the unnested case works fine, the nested case doesn't, I'm getting ghc panic due to boxing. This is true even for simpler examples like d (*2) 1. I wonder if this is related to #8 and #9.

TimPut commented 5 years ago

As I see it the immediate reason why this fails is that d needs to be inlined for the plugin to be able to work its magic. However a function can only be inlined if it's (syntactically) fully applied. In its current form, d when fully applied returns a value in the codomain of a differentiable function, but not in general a differentiable function.

I suspect the real reason why this isn't easy to fix is that the category of differentiable functions is not equal to the category of twice-differentiable functions.

conal commented 5 years ago

I remember thinking that the issue here was GHC applying rewrite rules and other optimizations in an unfortunate order. Although one can probably define a category of infinitely differentiable functions (lazily generating as many derivatives as you might want), another option (used above, IIRC) is to differentiate, extract a regular function, and differentiate again. The plugin would then perform two translations from GHC Core to categorical form. I do such cascading translations often, for instance to differentiate and then generate a circuit.

Now I'm wondering if @TimPut is onto a simpler explanation and solution. GHC inlining, however, does not require syntactic full application. Rather there have to be as many explicit arguments as there were parameters on the LHS of a definition. The important difference here is that we can encourage d to inline by moving one or more parameters from LHS to RHS:

d :: (Double -> a) -> Double -> a
d f = \ x -> derF f x 1
{-# INLINE d #-}
barak commented 4 years ago

That's funny. (Logically speaking, the information on the number of parameters that must be supplied in order to trigger inlining belongs in the INLINE directive rather than buried in the parameter count on the LHS of the definition. But I digress.) Will give it a shot.

I was trying to test whether nesting can be expressed, and if so whether it works correctly. For the case here, that pretty much comes down to handing variable scoping correctly. But if differentiation of higher-order functions is supported, there's a trickier case of nesting described in https://arxiv.org/abs/1211.4892

conal commented 4 years ago

I was trying to test whether nesting can be expressed, and if so whether it works correctly. For the case here, that pretty much comes down to handing variable scoping correctly.

All variables are eliminated during the translation to categorical form (described here), and that translation is independent of AD.

I don't think the perturbation confusion issue arises with the categorical approach. The extended essence-of-ad paper contains correctness proofs, which fortunately are fairly simple, since differentiation (really local affine approximation) is compositional over the categorical vocabulary in very simple ways, unlike with the lambda calculus. I'd be interested in exploring the question together, hopefully resolving it to our mutual satisfaction.

But if differentiation of higher-order functions is supported, there's a trickier case of nesting described ....

Thanks for the warning. Which example in the paper? Differentiation and higher-order functions seems to be a tricky combination for the categorical approach as well.

barak commented 4 years ago

Let me explain the simple meat of arXiv:1211.4892. Start with a derivative-taking operator d, restricted to taking a scalar input for clarity. Define a shift function s. Then take the derivative of s and use it in a nested fashion:

d (f :: R -> a) (x :: R) = (derivative of f at x) :: (tangent space of a at f(x))

s (u :: R) (f :: R -> a) (x :: R) = f (x + u)

dHat = d s 0

The functions d and dHat should behave identically. So check if d (d h) y and dHat (dHat h) y yield the same result.

Note that d can take the derivative of higher-order functions. Without that ability, this issue cannot be expressed.

barak commented 4 years ago

differentiation (really local affine approximation) is compositional over the categorical vocabulary in very simple ways

Yes, that is a very nice property. One issue is that even if a defined function is first-order, it may internally use higher-order constructs.

f ys y = foldl (+) 0 (map (^2) (map (y -) ys))

If we have a compositional formulation, and we push it into code like this, we're forced to define derivatives of higher-order functions in a fashion consonant with our first-order derivatives.

I'd very much welcome the chance to work through some of these issues with you. I feel like we each have a good handle on one appendage of the elephant, and if we pull together we might be able to really take it apart. Should we take the discussion to email?

TimPut commented 4 years ago

@barak I for one would appreciate it if the discussion stayed out in the open simply so that I can follow along and learn from you and Conal. Please consider this an expression of appreciation and not a demand.

barak commented 4 years ago

@TimPut I'm fine with that, until I start embarrassing myself!

silky commented 4 years ago

none of these examples work, presently.

> stack ghc nest.hs && ./nest
[1 of 1] Compiling Main             ( nest.hs, nest.o )
Linking nest ...
nest: Oops: toCcc' called

with

-- nest.hs
import ConCat.ADFun

d :: (Double -> a) -> Double -> a
d f = \x -> derF f x 1
{-# INLINE d #-}

main = do
  print $ d id 0                     -- should be 1
  print $ d (\x -> x*(d (x*) 2)) 1   -- should be 2