Making ReverseDiffSource compatible with Distributions

papamarkou commented 8 years ago

Hi @fredo-dedup, I hope you are well. I have a quick question about how easy it would be to make ReverseDiffSource work with Distributions, and in particular with the logpdf() function, which is useful for MCMC. Here is a simple example, where I define a function f as the log-pdf of a multivariate t-distribution and it fails; if it is not too complicated to fix this, it would be great:

julia> using Distributions

julia> using ReverseDiffSource

julia> n = 5
5

julia> Σ = eye(n)
5x5 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  1.0

julia> ν = 30.
30.0

julia> f(x) = logpdf(MvTDist(ν, zeros(n), (ν-2)*Σ/ν), x)
f (generic function with 1 method)

julia> rdiff(f, (ones(5),), order=2)
ERROR: no derivation rule for Distributions.logpdf at arg #1
 in error at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
 in getrule at /Users/theodore/.julia/v0.4/ReverseDiffSource/src/deriv_rule.jl:79
 in rev at /Users/theodore/.julia/v0.4/ReverseDiffSource/src/reversegraph.jl:45
 in reversepass! at /Users/theodore/.julia/v0.4/ReverseDiffSource/src/reversegraph.jl:246
 in reversegraph at /Users/theodore/.julia/v0.4/ReverseDiffSource/src/reversegraph.jl:23
 in rdiff at /Users/theodore/.julia/v0.4/ReverseDiffSource/src/rdiff.jl:137
 in rdiff at /Users/theodore/.julia/v0.4/ReverseDiffSource/src/frdiff.jl:56

cc @eford

fredo-dedup commented 8 years ago

Hi Theodore, I hope you are doing well and that all your projects are moving forward.

I managed to have something working but it is not entirely straightforward :

The first thing is that you have to define how the derivation accumulator changes for the functions involved (ReverseDiffSource comes only with the definition for the most basic functions). Here you have to define the rules for logpdf and MvTDist.
The second thing, and this was unexpected and adds a complication, is that MvTDist is a typealias and ReverseDiffSource has problems with that apparently. You have to work with the original type : Distributions.GenericMvTDist through the relation MvTDist(df, μ, Σ) = GenericMvTDist(df, μ, PDMat(Σ)). But once you use the new definition you also have to define the derivation rule for PDMat !

Luckily, you are only interested in the derivation wrt x, so we can just give dummy rules for Distributions.GenericMvTDist and PDMat. The important rule will be for logpdf when its first argument is a GenericMvTDist distribution. I didn't try to find it, I just gave a dummy rule in the working example below.

using Distributions
using PDMats  
using ReverseDiffSource

n = 5
Σ = eye(n)
ν = 30.

function f(x)
  d = Distributions.GenericMvTDist(ν, zeros(n), PDMat((ν-2)*Σ/ν) )
  logpdf(d, x)
end

f(ones(n)) # -7.177

# derivations rules
@deriv_rule Distributions.GenericMvTDist(a,b,c) a 0.
@deriv_rule Distributions.GenericMvTDist(a,b,c) b 0.
@deriv_rule Distributions.GenericMvTDist(a,b,c) c 0.

@deriv_rule PDMat(M) M zeros(M)

@deriv_rule logpdf(d::MvTDist, x::Vector) d 0.   # dummy rule

# This rule is important (but I put a random expression, to be changed !)
@deriv_rule logpdf(d::MvTDist, x::Vector) x ds * sum(d.Σ.mat) * dot(x,x) 

df = rdiff(f, (ones(5),), order=2)

v, dv, d2v = df(ones(n))

papamarkou commented 8 years ago

Thanks Frederic, all is going well here, I hope you are doing well too.

Your reply is very helpful because it provides me with a better understanding of how ReverseDiffSource can be expanded via @deriv_rule, and this is more important that the t-distribution example.

The tricky thing about the log-pdf definition of the t-distribution in Distributions is that it doesn't come in one block as a single function, but it invokes other nested functions. I made it work with ReverseDiffSource in a manual way, by reverse-engineering the log-pdf definition and putting it in the single following function

function plogtarget(p::Vector, v::Vector)
  hdf = 0.5*ν
  hdim = 0.5*n
  shdfhdim = hdf+hdim
  v = lgamma(shdfhdim)-lgamma(hdf)-hdim*log(ν)-hdim*log(pi)-0.5*logdet(Σt)
  z = p-μ
  v-shdfhdim*log1p(dot(z, Σtinv*z)/ν)
end

What I then realised is that reverse autodiff for a 20-dimensional t-distribution with high correlation between its components ran about 7.5 times faster than forward autodiff (as it is a well-known fact that reverse mode is better for stats inference, aka for f:R^n->R), this is why it's important to rely on your package for MCMC and stats.

It is ok with me if you want to close this issue, as it is not so straightforward how to implement all this for the t-distribution in its current format under Distributions. In order to understand a bit better the internals, can I ask for a clarification - why did you define the deriv rule of PDMat the way you did?

eford commented 8 years ago

Hi Theo,

If you're looking at performance optimization, would making types explicit help? E.g., function plogtarget{T<:Number}(p::Vector{T}, v::Vector{T}) and/or type annotating the variables coming out from outside scope?

Cheers, Eric

On Mon, Aug 1, 2016 at 10:26 AM, Theodore Papamarkou < notifications@github.com> wrote:

Thanks Frederic, all is going well here, I hope you are doing well too.

Your reply is very helpful because it provides me with a better understanding of how ReverseDiffSource can be expanded via @deriv_rule, and this is more important that the t-distribution example.

The tricky thing about the log-pdf definition of the t-distribution in Distributions is that it doesn't come in one block as a single function, but it invokes other nested functions. I made it work with ReverseDiffSource in a manual way, by reverse-engineering the log-pdf definition and putting it in the single following function

function plogtarget(p::Vector, v::Vector) hdf = 0.5_ν hdim = 0.5_n shdfhdim = hdf+hdim v = lgamma(shdfhdim)-lgamma(hdf)-hdim_log(ν)-hdim_log(pi)-0.5_logdet(Σt) z = p-μ v-shdfhdim_log1p(dot(z, Σtinv*z)/ν) end

What I then realised is that reverse autodiff for a 20-dimensional t-distribution with high correlation between its components ran about 7.5 times faster than forward autodiff (as it is a well-known fact that reverse mode is better for stats inference, aka for f:R^n->R), this is why it's important to rely on your package for MCMC and stats.

It is ok with me if you want to close this issue, as it is not so straightforward how to implement all this for the t-distribution in its current format under Distribtutions. In order to understand a bit better the internals, can I ask for a clarification - why did you define the deriv rule of PDMat the way you did?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiff/ReverseDiffSource.jl/issues/40#issuecomment-236595769, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQywpuEzoHDqXJ_3ccQHazGzZRf5opKks5qbgH5gaJpZM4JXMf- .

Eric Ford Professor of Astronomy & Astrophysics Center for Exoplanets & Habitable Worlds Center for Astrostatistics Institute for CyberScience Penn State Astrobiology Research Center Pennsylvania State University

papamarkou commented 8 years ago

Good point, Eric, this could possibly affect performance, I haven't checked it yet, will look into it.

fredo-dedup commented 8 years ago

The expression in the @deriv_rule should yield the new value for the derivative accumulator of the variable (M in @deriv_rule PDMat(M) M zeros(M) ). Since the accumulator for a matrix is a matrix of the same dimension (and I am not interested in its actual value in the above example) I used zeros(M). You'll note that I mistakenly broke this rule in @deriv_rule Distributions.GenericMvTDist(a,b,c) but it did not matter in this case.

fredo-dedup commented 8 years ago

Closing as this was a question and a solution was provided.

JuliaAttic / ReverseDiffSource.jl

Making ReverseDiffSource compatible with Distributions #40