cscherrer / Soss.jl

Probabilistic programming via source rewriting
https://cscherrer.github.io/Soss.jl/stable/
MIT License
413 stars 30 forks source link

Nested models #130

Closed cscherrer closed 3 years ago

cscherrer commented 4 years ago

In this issue, @trappmartin suggests models roughly equivalent to

m1 = @model a,b begin
    p ~ Beta(a, b)
    x ~ Normal(p, 1.0) |> iid(3)
end

m2 = @model begin
    a ~ Beta(0.5, 0.5)
    b ~ Beta(1, 0.5)
    m ~ m1(a=a, b=b)
end

This currently fails in Soss:

julia> dynamicHMC(m2(), (m=(x=rand(3),),))
ERROR: UndefVarError: p not defined

The challenge here is that the m1 model is not "sampled" or "observed", but instead a little of both.

trappmartin commented 4 years ago

Cool. Another challenge is that you cannot sample these nested models with HMC. :)

You need to be able to automatically derive importance/proposal weights in a nested manner to be able handle such models as the log joint of m2 is intractable.

cscherrer commented 4 years ago

Do you mean for nested models in general, or for this model? As far as HMC working, this works:

m3 = @model begin
    a ~ Beta(0.5, 0.5)
    b ~ Beta(1, 0.5)
    p ~ Beta(a, b)
    x ~ Normal(p, 1.0) |> iid(3)
end

I'd think between m2 and m3 it's just a matter of getting the namespaces right.

trappmartin commented 4 years ago

Nested models, like the one with m1 and m2 in general. The reason for this is that you would need to be able to compute the conditional p(m | a, b, x), which is intractable (in general).

See my post here: https://github.com/TuringLang/DynamicPPL.jl/issues/4#issuecomment-610315490

This is a very peculiar situation (I know :) ), but I think quite interesting. If you think about the expected value w.r.t. the posterior of these models, you will see that you have nested expectations. This is why you cannot unroll the m1 into m2. Well you can, but then you are performing inference in a different model then the one specified.

-- edit --

In short, the model m3 is not equivalent to the one in m2. In fact, you cannot represent m2 using a "simple" probabilistic program as it's a more restricted class than the one m2 is in.

cscherrer commented 4 years ago

Sorry, I'm still not seeing it. For a given (a,b), sampling from m1 means sampling a p, and then an x. There's nothing I can see to integrate over, it's just a matter of namespaces. How can the model really change by unrolling? Maybe we need something even simpler, like a coin flip model, to really see the semantics

trappmartin commented 4 years ago

Have you checked the comment I linked to?

cscherrer commented 4 years ago

Yes, still not clear to me

trappmartin commented 4 years ago

Say you have this model:

m1 = @model a,b begin
    m ~ Beta(a, b)                    # p(m | a, b)
    x ~ Normal(m, 1.0) |> iid(3)      # p(x | a, b, m)
end

m2 = @model begin
    a ~ Beta(0.5, 0.5)    # p(a)
    b ~ Beta(1, 0.5)      # p(b)
    m ~ m1(a=a, b=b)      # p(m | a, b, x)
end

and you want to perform approximate posterior inference in it. Then you would usually do this by sampling according to the unnormalised posterior distribution. For this model, the unnormalised posterior distribution is given as, image

-- edit -- now with correct equation.. sorry. -- edit 2 -- I added some comments in the code, which hopefully help to understand what I talk about..

So, in order to be able to perform inference, we would need to be able to compute the denominator in the last equation. This is, however, intractable in general.

trappmartin commented 4 years ago

If you, however, unroll the model, the join becomes: image

Which is a different model than the one shown above.

trappmartin commented 4 years ago

Nested models are simply not the same as the model in m3. You can perform inference in nested models using, e.g. IS as illustrated in https://github.com/TuringLang/DynamicPPL.jl/issues/4#issuecomment-610315490 .

trappmartin commented 4 years ago

Hope that helps. :)

baggepinnen commented 4 years ago

This problem comes up in the SIR example with a MarkovChain in case the initial state is considered unknown. If instead of

m = @model s0 begin
    α ~ Uniform()
    β ~ Uniform()
    γ ~ Uniform()
    pars = (α=α, β=β, γ=γ)
    x ~ MarkovChain(pars, mstep(pars=pars, state=s0))
end

the initial state is specified as

m = @model begin
    α ~ Uniform()
    β ~ Uniform()
    γ ~ Uniform()
    pars = (α=α, β=β, γ=γ)
    s ~ some_dist
    i ~ some_dist
    r ~ some_dist
    s0 = @namedtuple(s,i,r) # The initial state
    x ~ MarkovChain(pars, mstep(pars=pars, state=s0))
end
trappmartin commented 4 years ago

Cool, could you repost this please on https://github.com/TuringLang/DynamicPPL.jl/issues/4 so that we can keep track of it in Turing.

Thanks!

cscherrer commented 4 years ago

Thanks @baggepinnen . For some additional context, this is in terms of the code here: https://gist.github.com/cscherrer/d649f7d4477bde4f0843f2492442cf20

It's really interesting that this causes a problem, because in a way it's the opposite problem from @trappmartin's example. There, partial observation of the submodel caused issues. Here, partial observation is fine, but allowing s0 to vary from the top model breaks things.

I'll need to think more about commonalities in these two cases

cscherrer commented 4 years ago

@trappmartin I'm still not sure this is right. One thing that's confusing (at least) me is the repeated m. Let's rewrite it like this:

m1 = @model a,b begin
    q ~ Beta(a, b)                    # p(q | a, b)
    x ~ Normal(q, 1.0) |> iid(3)      # p(x | q)
end

m2 = @model begin
    a ~ Beta(0.5, 0.5)    # p(a)
    b ~ Beta(1, 0.5)      # p(b)
    s ~ m1(a=a, b=b)      # p(s | a, b)
end

Then m2 factors as

klf_2020-04-11_09-48_IMLJOW 600

But s = (q,x), and m1 tells us that

klf_2020-04-11_09-52_uLbNVK 600

So this gives us the factorization of the model as a whole.

Maybe it can help to look at another example. A t distribution can be written as a mixture of Gaussians. So we could compare

  1. A model with a t distribution
  2. A model where the t is implemented as a separate mixture of Gaussians model
  3. Same as (2), but unrolled
trappmartin commented 4 years ago

As I mentioned before, this is a different model. If you take nested models "serious", then you don't come around the fact that you have nested expectations and an intractable joint. This is nothing I made up.

I can send you a few papers on this topic if you like.

trappmartin commented 4 years ago

This: https://user-images.githubusercontent.com/1184449/79049707-0da18e80-7bda-11ea-9bac-5d84d54b3db4.png

is only correct if you want to unroll the model. Which, in turn, is a different model then the one I wrote. In general, this equality is not correct. Only if you make the assumption that you made above, i.e. that you want to unroll the model.

trappmartin commented 4 years ago

Ah, now I see it. Yes, you are indeed thinking about a different model. The reuse of m is intentionally. Otherwise, you can unroll it as it is not a nested model.

cscherrer commented 4 years ago

Papers would be great, thank you. My big current questions:

trappmartin commented 4 years ago

Sure, will do once I have the time. For now, you can derive the conditional p(m | a, b, x) (or p(s | a, b, x) if you like). You should see yourself that you cannot simply unroll this.

Alternatively, try computing the expectations of the posterior. This expectation will turn out to be nested, if you do it correctly. :)

cscherrer commented 4 years ago

This seems to me like a strange thing to be interested in, since s = (q, x) So you end up with P(s|a,b,x) = P(q,x | a,b,x). There's a delta function is here, but if the xs match you get

P(s|a,b,x) = P(q,x | a,b,x)
           = P(q | a,b,x)
           = P(q | a,b) P(x | q)

So I'm still not getting it, guess I need to read the papers

trappmartin commented 4 years ago

No worries, I needed to bang my head against it at first too. It's not too intuitive I find.

cscherrer commented 3 years ago

Got it working!

julia> using Soss
[ Info: Precompiling Soss [8ce77f84-9b61-11e8-39ff-d17a774bf41c]

julia> m1 = @model a, b begin
                   p ~ Beta(a, b)
                   x ~ Normal(p, 1.0) |> iid(3)
               end
@model (a, b) begin
        p ~ Beta(a, b)
        x ~ Normal(p, 1.0) |> iid(3)
    end

julia> m2 = @model begin
                   a ~ Beta(0.5, 0.5)
                   b ~ Beta(1, 0.5)
                   m ~ m1(a = a, b = b)
               end
@model begin
        b ~ Beta(1, 0.5)
        a ~ Beta(0.5, 0.5)
        m ~ m1(a = a, b = b)
    end

julia> t = xform(m2() | (; m = (; x = rand(3))))
TransformVariables.TransformTuple{NamedTuple{(:b, :a, :m), Tuple{TransformVariables.ScaledShiftedLogistic{Float64}, TransformVariables.ScaledShiftedLogistic{Float64}, TransformVariables.TransformTuple{NamedTuple{(:p,), Tuple{TransformVariables.ScaledShiftedLogistic{Float64}}}}}}}((b = as𝕀, a = as𝕀, m = TransformVariables.TransformTuple{NamedTuple{(:p,), Tuple{TransformVariables.ScaledShiftedLogistic{Float64}}}}((p = as𝕀,), 1)), 3)

julia> logdensity(m2() | (; m = (; x = rand(3))), t(randn(3)))
-1.6705301923419098