Closed papamarkou closed 8 years ago
Hi Theodore, I hope you are doing well and that all your projects are moving forward.
I managed to have something working but it is not entirely straightforward :
ReverseDiffSource
comes only with the definition for the most basic functions). Here you have to define the rules for logpdf
and MvTDist
.ReverseDiffSource
has problems with that apparently. You have to work with the original type : Distributions.GenericMvTDist
through the relation MvTDist(df, μ, Σ) = GenericMvTDist(df, μ, PDMat(Σ))
. But once you use the new definition you also have to define the derivation rule for PDMat
!Luckily, you are only interested in the derivation wrt x
, so we can just give dummy rules for Distributions.GenericMvTDist
and PDMat
. The important rule will be for logpdf
when its first argument is a GenericMvTDist distribution. I didn't try to find it, I just gave a dummy rule in the working example below.
using Distributions
using PDMats
using ReverseDiffSource
n = 5
Σ = eye(n)
ν = 30.
function f(x)
d = Distributions.GenericMvTDist(ν, zeros(n), PDMat((ν-2)*Σ/ν) )
logpdf(d, x)
end
f(ones(n)) # -7.177
# derivations rules
@deriv_rule Distributions.GenericMvTDist(a,b,c) a 0.
@deriv_rule Distributions.GenericMvTDist(a,b,c) b 0.
@deriv_rule Distributions.GenericMvTDist(a,b,c) c 0.
@deriv_rule PDMat(M) M zeros(M)
@deriv_rule logpdf(d::MvTDist, x::Vector) d 0. # dummy rule
# This rule is important (but I put a random expression, to be changed !)
@deriv_rule logpdf(d::MvTDist, x::Vector) x ds * sum(d.Σ.mat) * dot(x,x)
df = rdiff(f, (ones(5),), order=2)
v, dv, d2v = df(ones(n))
Thanks Frederic, all is going well here, I hope you are doing well too.
Your reply is very helpful because it provides me with a better understanding of how ReverseDiffSource
can be expanded via @deriv_rule
, and this is more important that the t-distribution example.
The tricky thing about the log-pdf definition of the t-distribution in Distributions
is that it doesn't come in one block as a single function, but it invokes other nested functions. I made it work with ReverseDiffSource
in a manual way, by reverse-engineering the log-pdf definition and putting it in the single following function
function plogtarget(p::Vector, v::Vector)
hdf = 0.5*ν
hdim = 0.5*n
shdfhdim = hdf+hdim
v = lgamma(shdfhdim)-lgamma(hdf)-hdim*log(ν)-hdim*log(pi)-0.5*logdet(Σt)
z = p-μ
v-shdfhdim*log1p(dot(z, Σtinv*z)/ν)
end
What I then realised is that reverse autodiff for a 20-dimensional t-distribution with high correlation between its components ran about 7.5 times faster than forward autodiff (as it is a well-known fact that reverse mode is better for stats inference, aka for f:R^n->R), this is why it's important to rely on your package for MCMC and stats.
It is ok with me if you want to close this issue, as it is not so straightforward how to implement all this for the t-distribution in its current format under Distributions
. In order to understand a bit better the internals, can I ask for a clarification - why did you define the deriv rule of PDMat
the way you did?
Hi Theo,
If you're looking at performance optimization, would making types explicit help? E.g., function plogtarget{T<:Number}(p::Vector{T}, v::Vector{T}) and/or type annotating the variables coming out from outside scope?
Cheers, Eric
On Mon, Aug 1, 2016 at 10:26 AM, Theodore Papamarkou < notifications@github.com> wrote:
Thanks Frederic, all is going well here, I hope you are doing well too.
Your reply is very helpful because it provides me with a better understanding of how ReverseDiffSource can be expanded via @deriv_rule, and this is more important that the t-distribution example.
The tricky thing about the log-pdf definition of the t-distribution in Distributions is that it doesn't come in one block as a single function, but it invokes other nested functions. I made it work with ReverseDiffSource in a manual way, by reverse-engineering the log-pdf definition and putting it in the single following function
function plogtarget(p::Vector, v::Vector) hdf = 0.5_ν hdim = 0.5_n shdfhdim = hdf+hdim v = lgamma(shdfhdim)-lgamma(hdf)-hdim_log(ν)-hdim_log(pi)-0.5_logdet(Σt) z = p-μ v-shdfhdim_log1p(dot(z, Σtinv*z)/ν) end
What I then realised is that reverse autodiff for a 20-dimensional t-distribution with high correlation between its components ran about 7.5 times faster than forward autodiff (as it is a well-known fact that reverse mode is better for stats inference, aka for f:R^n->R), this is why it's important to rely on your package for MCMC and stats.
It is ok with me if you want to close this issue, as it is not so straightforward how to implement all this for the t-distribution in its current format under Distribtutions. In order to understand a bit better the internals, can I ask for a clarification - why did you define the deriv rule of PDMat the way you did?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiff/ReverseDiffSource.jl/issues/40#issuecomment-236595769, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQywpuEzoHDqXJ_3ccQHazGzZRf5opKks5qbgH5gaJpZM4JXMf- .
Eric Ford Professor of Astronomy & Astrophysics Center for Exoplanets & Habitable Worlds Center for Astrostatistics Institute for CyberScience Penn State Astrobiology Research Center Pennsylvania State University
Good point, Eric, this could possibly affect performance, I haven't checked it yet, will look into it.
The expression in the @deriv_rule
should yield the new value for the derivative accumulator of the variable (M
in @deriv_rule PDMat(M) M zeros(M)
). Since the accumulator for a matrix is a matrix of the same dimension (and I am not interested in its actual value in the above example) I used zeros(M)
.
You'll note that I mistakenly broke this rule in @deriv_rule Distributions.GenericMvTDist(a,b,c)
but it did not matter in this case.
Closing as this was a question and a solution was provided.
Hi @fredo-dedup, I hope you are well. I have a quick question about how easy it would be to make ReverseDiffSource work with Distributions, and in particular with the
logpdf()
function, which is useful for MCMC. Here is a simple example, where I define a functionf
as the log-pdf of a multivariate t-distribution and it fails; if it is not too complicated to fix this, it would be great:cc @eford