differentiating distributions

atteson commented 6 years ago

My use case is optimizing distributions. For example, the following simplified code fails in xdiff:

using Distributions
using XGrad
xdiff( p -> pdf( Normal(p[1],p[2]), 1.0 ); p=[0.0,1.0] )

In addition, the following (which the above ends up calling) also fails in xdiff:

using XGrad
xdiff( StatsFuns.normpdf; z=0.0 )

dfdx commented 6 years ago

Thanks for reporting. Basically, XGrad works by parsing source code and differentiating everything it can deal with. Although it can work with functions, tuples and even structs, things like conditions, macros, etc. which Distributions is full of are still out of scope.

Good news is there's an alternative to parsing source code - some time ago I started an alternative approach using function overloading. It worked for some basic code in Distributions, but is far from being finished yet. Let me check your examples later this week and update the code to cover what you have described.

dfdx commented 6 years ago

Not everything works smoothly, but generally using function overloading solves the problem for Distributions.jl. Please check out latest master of both - Espresso.jl and XGrad.jl - and try the following:

using XGrad
using Distributions

df = xdiff( (m, s) -> pdf( Normal(m,s), 1.0 ); ctx=Dict(:method => :track), m=0.0, s=1.0 )
df(5.0, 10.0)

The main difference is passing an option to use variable tracking instead of source code parsing - ctx=Dict(:method => :track). I've also split one parameter p into 2 separate since getindex isn't properly implemented yet. It should be fixed later, but for now I hope passing 2 arguments instead of 1 works for you.

It also works with StatsFuns.normpdf:

xdiff( StatsFuns.normpdf; ctx=Dict(:method => :track), z=0.0 )

There's still a plenty of little gotchas in function overloading approach, so please report any issues you encounter.

dfdx / XGrad.jl

differentiating distributions #17