ekmett / ad

Automatic Differentiation
http://hackage.haskell.org/package/ad
BSD 3-Clause "New" or "Revised" License
369 stars 73 forks source link

Determining when Jacobians are sparse #40

Closed dmcclean closed 9 years ago

dmcclean commented 9 years ago

To what extent is it possible for automatic differentiation to tell us when the partial derivative of a function with respect to one of its inputs is everywhere zero?

The general approach works by evaluating derivatives only at finitely many explicitly specified points.

But it seems that this analysis, or at least a very good conservative approximation of it, is syntax-directed on the grammar of math expressions that Num and Floating and friends let us construct, so maybe there is hope?

ekmett commented 9 years ago

AD really only gives an answer at the point in question, e.g. nothing prevents you from taking derivatives of functions that have discontinuities, so you can't really say anything about the global behavior of the function.

ekmett commented 9 years ago

e.g. the AD package can't tell if you are evaluating x or abs x or if x > 100 then x else -x. It doesn't get to see the cuts.

barak commented 9 years ago

After determining that f'(x)=0 for some particular x, where f' is generated using AD, you could use interval arithmetic to show that f'([x-c,x+c])=[0,0] for some particular c. Interval arithmetic can use roughly the same machinery as Forward AD, in fact we could easily add an interval arithmetic mode to this package. Depending on the details of your problem, this might suffice?

ekmett commented 9 years ago

@barak: Amusingly, @dmcclean actually works with me on the intervals package. =)

(It also works directly with the AD modes here out of the box.)

barak commented 9 years ago

@ekmett: cool. You know how in interval arithmetic, x-x gives a conservatively wide interval? One can imagine using AD to generate information which could allow less conservative intervals. In fact this was done, in work combining reverse-mode AD with interval arithmetic to get tighter bounds.

ekmett commented 9 years ago

I'm somewhat familiar with the work on Taylor models to do this.

It has been a low level background obsession of mine for a few years to get to where I can talk about them nicely in Haskell.

ekmett commented 9 years ago

intervals actually spun out of earlier work in that space.

I had to cripple it in the end though, because I couldn't correctly control rounding modes on any of the libm calls.

My rounded package has a properly rounded interval arithmetic type which rounds the interval out toward infinity in the positive direction and toward negative infinity in the negative direction to ensure the interval really really does contain the answer.

With the new GMP hooks in GHC 7.10, we should be able to use the host library version of mpfr directly rather than have to monkey-patch in-process a version that has a dozen or so hacks, which will make rounded much much easier to install. This will enable me to actually start playing around with numerics again. I pretty much stopped because getting it installed was too painful for users for me to actually, you know, have users. =)

barak commented 9 years ago

You probably know this, but in practice interval arithmetic packages leave rounding mode set to either towards +∞ or −∞ and then surround operations by negation when they want the opposite to the one that is set. This is because setting the rounding mode is enormously more expensive on most modern CPUs than a couple of negations. E.g., z0= −((−x0)+(−y0)).

ekmett commented 9 years ago

Yep. Sadly I can't even do that. =(

libm gives not just slightly wrong answers, but completely batshit wrong answers when you change the rounding mode, even just to say toward -∞ rather than toward 0, despite being the library that defined how you change the rounding modes. =/ If you ask for a cos with the rounding mode changed libm will basically give you nonsense. The main problem is even if I don't call cos on Double, if I set the rounding mode, I can't know that somebody else won't.

In theory I could use another approach, e.g. encourage users to link against crlibm which gives correct rounding for all the libm functionality, but it isn't widely distributed, and the linkage instructions are complicated, but even then I'd have to change the default rounding leading to odd answers in code that doesn't expect this library to be in effect.

If I was doing something special purpose rather than a library that should be able to link against anything already written in Haskell, I'd have a lot more freedom here.

As I'm more concerned with causing silent semantic changes than speed, and I'm not in a hurry to reimplement all my trig functions from first principles portably as if I'm writing CORDICs on a calculator by hand, I'm currently using the MPFR library which gives proper rounding (at a huge hit to performance) for any fixed precision number of bits of significand you want.

On the plus side, it is very easy for me to increase precision, and I should be able to explore the balance between increased precision against more terms in the Taylor model if/when i get back to playing with Taylor models.

On the minus side, I probably won't win any prizes for speed.

barak commented 9 years ago

Well it should still be possible to do it with round-toward-0, at the expense of casing off on signs all over the place. Another attitude would be to just declare that a little fuzz at the endpoints of intervals is par for the course. Just roll your eyes and cite "worse is better" at anyone who tries to quibble.

ekmett commented 9 years ago

Even with round towards zero, libm does all sorts of bad things near the last couple of ulps. =( Ignoring rounding is basically what the intervals package does today.

The problem comes in when you start playing with Taylor models you really start using every single one of those ulps because the intervals are very tiny, just covering the error terms of your truncated Taylor series, and we're constantly slopping error into them.

The worse is better solution is worse enough to completely invalidate that whole technique. =/

barak commented 9 years ago

Taking a functional perspective, anything that is sensitive to the rounding mode should either take it as a parameter or needs to live in a monad that implicitly threads it. Otherwise you can easily violate referential transparency, and the compiler would have license to, under suitable conditions, migrate your numeric operations into the wrong rounding context. So Haskell either needs to keep the rounding mode locked, or expose it in a monad and push all operations on Double into a RoundingMode monad or into the IO monad or whatever. All these alternatives are pretty horrible to contemplate for anyone who wants to get numeric work done and wants to fiddle with the rounding mode.

One idea would be that the rounding mode specifies a specific interpretation of some floating point code, and so a general mechanism for performing and manipulating nonstandard interpretations might be a good fit.

ekmett commented 9 years ago

Yep. Right now Haskell just pretends the rounding mode doesn't exist and offers no tools to change it. MPFR at least lets me have a local numeric type with correct rounding down to the last ulp, side-stepping the issue for now.

dmcclean commented 9 years ago

Agreed with the side discussion that the rounding mode limitation is annoying and it will be cool when the ideas in rounded can be used more easily.

I think I stated my original question in a slightly unclear way. I know that it's undecidable to discover the global behavior of a function. What I'm wondering I guess isn't properly about differentiation so much, as analyzing whether certain arguments are used at all. There are so many ways to write \x -> x - x where we can see by inspection that the derivative is everywhere zero, but where an analysis to do so wouldn't be at all easy and in general undecidable. But can we hope to see that the partial derivative of \x y z -> x + cos z with respect to y is everywhere zero by using the same types of techniques that a compiler would use for finding free occurrences?

ekmett commented 9 years ago

I think in general to do so you'd need to switch from working with arbitrary functions, and to something like source-to-source translation, like ADIC, or ADIFOR. It could be written, but getting there takes you outside of what you can say with operator overloading AD.

Alternately you can write your entire program in an EDSL so you can get the intensional information out. This is the approach taken by hakaru.

dmcclean commented 9 years ago

Cool. Thanks for the pointers. And of course for this library, which is impressive. :)

dmcclean commented 9 years ago

If you are interested, take a look at this gist. https://gist.github.com/dmcclean/b02baa8a04825c14cbae

It appears to work, but it might be because I am not thinking of the right tough test cases.

*ApproximateJacobian> let f = \[x, y, z] -> [x + cos z, 6 * y]
*ApproximateJacobian> approximateJacobian f 3
[[Just 1.0,Just 0.0],[Just 0.0,Just 6.0],[Nothing,Just 0.0]]
dmcclean commented 9 years ago

(The 3 there is the number of arguments that f takes.)

dmcclean commented 9 years ago

Oh, now I get it. (Sorry, this takes me a while sometimes.) The problem is that there is no way to write a useful Ord instance or to otherwise teach it that the abstract interpretation for branching constructs is the conservative combination of what each of the branches would do.

ekmett commented 9 years ago

Exactly. For that you'd need to remove all ability to discriminate values in the numeric type in question, and replace it with things that are expressed in an EDSL you can extract intensional information from.

barak commented 9 years ago

Regarding the main question here: one way to phrase it is that you're looking for (a conservative approximation to) the dependency relation between input and output variables. This is something compilers do using abstract interpretation. You could do it using some kind of nonstandard interpretation mechanism and a tangent algebra holding booleans expressing the dependency structure, but the nonstandard interpretation mechanism would have to allow not just things like + and cos to be given new interpretations, but also conditional constructs in order to trace all conditions and appropriately merge the results.