Quadratic approximation (Laplace)

zenon commented 4 years ago

Hi Chad,

while trying to understand "Statistical Rethinking" by Richard McElreath, I see that he uses quadratic approximation for the first half of the book.

As far as I understand, it approximates the posterior by estimating the mode of the posterior, and then finding the quadratic curvature of the log distribution at that place. So the result is a gaussian.

He mentiones in the lecture that this is in many cases a surprisingly good approximation, and takes much less computational effort than using MCMC.

His R implementation of quap is here https://github.com/rmcelreath/rethinking/blob/master/R/map-quap.r

Do you think something like this can be made for Soss too?

Sadly, I still don't feel qualified to do so. So, yes, this is a feature request.

Kind greetings, z.

cscherrer commented 4 years ago

Hi @zenon , sorry for the delayed response.

From your description, this is also often called a Laplace approximation. I'll change the title, hope this is ok.

I think we'd want the domain of the quadratic to be ℝⁿ, so we'd need to transform (xform in Soss). That's pretty easy.

Then we'll need to

Find the posterior mode (MAP estimate)
Find the Hessian, probably using Zygote.

zenon commented 4 years ago

Hi Chad,

well, I'm impressed about the traffic in the Soss project these days. So, I didn't have the impression of you idling :-)

Cool, yes, that sounds right. It may be that we aren't always on R^n, but some dimensions may be restricted to intervals. That might make it more difficult. Or is this what xform is for? And I expect that we need to give some initial values (in case these cannon be inferred, that would be better, of course).

Thank you for the response!

Kind greetings, z.

cscherrer commented 4 years ago

well, I'm impressed about the traffic in the Soss project these days. So, I didn't have the impression of you idling :-)

Yeah, it's great the way things have picked up. We've been fortunate to have some really great contributors, and lately Dilum and Joey have really been great pushing things forward.

It may be that we aren't always on R^n, but some dimensions may be restricted to intervals. That might make it more difficult. Or is this what xform is for?

Yep! Here's a silly example. Say I have

m = @model begin
    a ~ Exponential()
    b ~ Normal()
    p ~ Beta(a, b^2)
end

Then

julia> tr = xform(m())
TransformVariables.TransformTuple{NamedTuple{(:b, :a, :p),Tuple{TransformVariables.Identity,TransformVariables.ShiftedExp{true,Float64},TransformVariables.ScaledShiftedLogistic{Float64}}}}((b = asℝ, a = asℝ₊, p = as𝕀), 3)

julia> tr(10.0 .+ randn(3))
(b = 11.451659890547628, a = 49022.08588576738, p = 0.9999880279321084)

julia> tr(-10.0 .+ randn(3))
(b = -11.0897193463806, a = 1.382748478446301e-5, p = 2.24356167877106e-5)

This turns any point in ℝⁿ into something the model can use

zenon commented 4 years ago

Wow. Cool!

zenon commented 3 years ago

Only now trying to find out what xform does here.

What ist its outcome? tr seems not to be a model, as

 rand(tr())

doesn't work. And what is the role of the arguments you give to tr?

cscherrer commented 3 years ago

xform returns one of these: https://tamaspapp.eu/TransformVariables.jl/stable/

The idea is that sampling a model returns a NamedTuple, with some constraints due to differing supports across distributions.

We can evaluate the logpdf on any of these, but the structure of "constrained named tuple space" makes it inconvenient to explore.

So, the result of xform gives us a function that takes a Vector of the right length and returns a named tuple we can evaluate. Composing these lets us treat the model as if it's just a function on ℝⁿ.

zenon commented 3 years ago

Hi, is there any documentation for xform? Or examples?

 m1 = @model N begin
     a ~ Beta(1,1)
     X ~ Bernoulli(a) |> iid(N)
 end;

Then xform(m1(N = 5)) gives me

MethodError: no method matching asTransform(::UnitRange{Int64})

Closest candidates are:

asTransform(!Matched::Distributions.RealInterval) at /home/mesh/.julia/packages/Soss/LvmPD/src/primitives/xform.jl:78

xform(::Distributions.Bernoulli{Float64}, ::NamedTuple{(),Tuple{}})@xform.jl:70 xform(::Soss.iid{Int64,Distributions.Bernoulli{Float64}}, ::NamedTuple{(),Tuple{}})@xform.jl:121 macro expansion@closure_conv.jl:121[inlined] _xform(::Type{TypeEncoding(Main.workspace6)}, ::Soss.Model{NamedTuple{(:N,),T} where T<:Tuple,TypeEncoding(begin a ~ Beta(1, 1) X ~ Bernoulli(a) |> iid(N) end),TypeEncoding(Main.workspace6)}, ::NamedTuple{(:N,),Tuple{Int64}}, ::NamedTuple{(),Tuple{}})@closure_conv.jl:121 xform(::Soss.JointDistribution{NamedTuple{(:N,),Tuple{Int64}},NamedTuple{(:N,),T} where T<:Tuple,TypeEncoding(begin a ~ Beta(1, 1) X ~ Bernoulli(a) |> iid(N) end),TypeEncoding(Main.workspace6)}, ::NamedTuple{(),Tuple{}})@xform.jl:15

And I'm stuck again.

cscherrer commented 3 years ago

Right, that's because X is discrete, so its support doesn't biject to any ℝⁿ. You can see what it's trying to do like this:

julia> sourceXform(m1)
quote
    _result = NamedTuple()
    a = rand(Beta(1, 1))
    _t = xform(Beta(1, 1), _data)
    _result = merge(_result, (a = _t,))
    X = rand(iid(N, Bernoulli(a)))
    _t = xform(iid(N, Bernoulli(a)), _data)
    _result = merge(_result, (X = _t,))
    (TransformVariables.as)(_result)
end

xform is mostly an internal thing, though we should definitely add some developer docs. What are you trying to do that you need to use it directly?

zenon commented 3 years ago

Ah, of course. laughs

Hallo Chad! Thank you for the answer.

What are you trying to do

Well. I try to understand Soss and probabilistic programming. That's the main task, and I'm not there yet. In this sub task, I still try to understand how to do the Laplace approximation. First part of sub task, implement MAP.

I assume, I can use something like Optim.jl (at least, as I learn from your answer, when everything is continuous). When I use xf = xform(model), I can use logpdf(model, xf(any vector of suitable dimension)) to compute a likelyhood, and Optim can find MAP. So slowly slowly I seem to understand the first step.

That's where I am in the moment.

Kind greetings from Hamburg, z.

cscherrer commented 3 years ago

That sounds great! From there, you'll need the Hessian (matrix of second derivatives). From there, it will be different depending whether you do the approximation in the support or in the transformed space. And very often the log-determinant will come into play, because the transformation stretches the space in funny ways. Probably best to get MAP going before worrying too much about that.

Warm regards from Seattle :)

jkbest2 commented 3 years ago

Just stumbled on this issue. I started working on a Laplace approximation package a while ago, but it has a long way to go before it's generally usable. Ideally I love to be usable for (marginal) max likelihood and composable with the various MCMC samplers/posterior approximations, similar to this paper that uses the Laplace approximation for some parameters and NUTS for the rest. I found Section 3.4 of GPML helpful. The Stan developers recently published this paper, which presents a more efficient way to do the approximation and get derivatives of the hyperparameters it depends on. GPML and the Stan developers' paper only give results for a few observation likelihoods because you need third derivatives. TMB (paper) on the other hand is very generic, so it's obviously possible to generalize.

cscherrer / Soss.jl

Quadratic approximation (Laplace) #162