Closed cscherrer closed 3 years ago
Cool. Another challenge is that you cannot sample these nested models with HMC. :)
You need to be able to automatically derive importance/proposal weights in a nested manner to be able handle such models as the log joint of m2 is intractable.
Do you mean for nested models in general, or for this model? As far as HMC working, this works:
m3 = @model begin
a ~ Beta(0.5, 0.5)
b ~ Beta(1, 0.5)
p ~ Beta(a, b)
x ~ Normal(p, 1.0) |> iid(3)
end
I'd think between m2
and m3
it's just a matter of getting the namespaces right.
Nested models, like the one with m1
and m2
in general. The reason for this is that you would need to be able to compute the conditional p(m | a, b, x)
, which is intractable (in general).
See my post here: https://github.com/TuringLang/DynamicPPL.jl/issues/4#issuecomment-610315490
This is a very peculiar situation (I know :) ), but I think quite interesting. If you think about the expected value w.r.t. the posterior of these models, you will see that you have nested expectations. This is why you cannot unroll the m1
into m2
. Well you can, but then you are performing inference in a different model then the one specified.
-- edit --
In short, the model m3
is not equivalent to the one in m2
. In fact, you cannot represent m2
using a "simple" probabilistic program as it's a more restricted class than the one m2
is in.
Sorry, I'm still not seeing it. For a given (a,b)
, sampling from m1
means sampling a p
, and then an x
. There's nothing I can see to integrate over, it's just a matter of namespaces. How can the model really change by unrolling? Maybe we need something even simpler, like a coin flip model, to really see the semantics
Have you checked the comment I linked to?
Yes, still not clear to me
Say you have this model:
m1 = @model a,b begin
m ~ Beta(a, b) # p(m | a, b)
x ~ Normal(m, 1.0) |> iid(3) # p(x | a, b, m)
end
m2 = @model begin
a ~ Beta(0.5, 0.5) # p(a)
b ~ Beta(1, 0.5) # p(b)
m ~ m1(a=a, b=b) # p(m | a, b, x)
end
and you want to perform approximate posterior inference in it. Then you would usually do this by sampling according to the unnormalised posterior distribution. For this model, the unnormalised posterior distribution is given as,
-- edit -- now with correct equation.. sorry. -- edit 2 -- I added some comments in the code, which hopefully help to understand what I talk about..
So, in order to be able to perform inference, we would need to be able to compute the denominator in the last equation. This is, however, intractable in general.
If you, however, unroll the model, the join becomes:
Which is a different model than the one shown above.
Nested models are simply not the same as the model in m3
.
You can perform inference in nested models using, e.g. IS as illustrated in https://github.com/TuringLang/DynamicPPL.jl/issues/4#issuecomment-610315490 .
Hope that helps. :)
This problem comes up in the SIR example with a MarkovChain in case the initial state is considered unknown. If instead of
m = @model s0 begin
α ~ Uniform()
β ~ Uniform()
γ ~ Uniform()
pars = (α=α, β=β, γ=γ)
x ~ MarkovChain(pars, mstep(pars=pars, state=s0))
end
the initial state is specified as
m = @model begin
α ~ Uniform()
β ~ Uniform()
γ ~ Uniform()
pars = (α=α, β=β, γ=γ)
s ~ some_dist
i ~ some_dist
r ~ some_dist
s0 = @namedtuple(s,i,r) # The initial state
x ~ MarkovChain(pars, mstep(pars=pars, state=s0))
end
Cool, could you repost this please on https://github.com/TuringLang/DynamicPPL.jl/issues/4 so that we can keep track of it in Turing.
Thanks!
Thanks @baggepinnen . For some additional context, this is in terms of the code here: https://gist.github.com/cscherrer/d649f7d4477bde4f0843f2492442cf20
It's really interesting that this causes a problem, because in a way it's the opposite problem from @trappmartin's example. There, partial observation of the submodel caused issues. Here, partial observation is fine, but allowing s0
to vary from the top model breaks things.
I'll need to think more about commonalities in these two cases
@trappmartin I'm still not sure this is right. One thing that's confusing (at least) me is the repeated m
. Let's rewrite it like this:
m1 = @model a,b begin
q ~ Beta(a, b) # p(q | a, b)
x ~ Normal(q, 1.0) |> iid(3) # p(x | q)
end
m2 = @model begin
a ~ Beta(0.5, 0.5) # p(a)
b ~ Beta(1, 0.5) # p(b)
s ~ m1(a=a, b=b) # p(s | a, b)
end
Then m2
factors as
But s = (q,x)
, and m1
tells us that
So this gives us the factorization of the model as a whole.
Maybe it can help to look at another example. A t distribution can be written as a mixture of Gaussians. So we could compare
As I mentioned before, this is a different model. If you take nested models "serious", then you don't come around the fact that you have nested expectations and an intractable joint. This is nothing I made up.
I can send you a few papers on this topic if you like.
This: https://user-images.githubusercontent.com/1184449/79049707-0da18e80-7bda-11ea-9bac-5d84d54b3db4.png
is only correct if you want to unroll the model. Which, in turn, is a different model then the one I wrote. In general, this equality is not correct. Only if you make the assumption that you made above, i.e. that you want to unroll the model.
Ah, now I see it.
Yes, you are indeed thinking about a different model. The reuse of m
is intentionally. Otherwise, you can unroll it as it is not a nested model.
Papers would be great, thank you. My big current questions:
Sure, will do once I have the time.
For now, you can derive the conditional p(m | a, b, x)
(or p(s | a, b, x)
if you like). You should see yourself that you cannot simply unroll this.
Alternatively, try computing the expectations of the posterior. This expectation will turn out to be nested, if you do it correctly. :)
This seems to me like a strange thing to be interested in, since s = (q, x)
So you end up with P(s|a,b,x) = P(q,x | a,b,x)
. There's a delta function is here, but if the x
s match you get
P(s|a,b,x) = P(q,x | a,b,x)
= P(q | a,b,x)
= P(q | a,b) P(x | q)
So I'm still not getting it, guess I need to read the papers
No worries, I needed to bang my head against it at first too. It's not too intuitive I find.
Got it working!
julia> using Soss
[ Info: Precompiling Soss [8ce77f84-9b61-11e8-39ff-d17a774bf41c]
julia> m1 = @model a, b begin
p ~ Beta(a, b)
x ~ Normal(p, 1.0) |> iid(3)
end
@model (a, b) begin
p ~ Beta(a, b)
x ~ Normal(p, 1.0) |> iid(3)
end
julia> m2 = @model begin
a ~ Beta(0.5, 0.5)
b ~ Beta(1, 0.5)
m ~ m1(a = a, b = b)
end
@model begin
b ~ Beta(1, 0.5)
a ~ Beta(0.5, 0.5)
m ~ m1(a = a, b = b)
end
julia> t = xform(m2() | (; m = (; x = rand(3))))
TransformVariables.TransformTuple{NamedTuple{(:b, :a, :m), Tuple{TransformVariables.ScaledShiftedLogistic{Float64}, TransformVariables.ScaledShiftedLogistic{Float64}, TransformVariables.TransformTuple{NamedTuple{(:p,), Tuple{TransformVariables.ScaledShiftedLogistic{Float64}}}}}}}((b = as𝕀, a = as𝕀, m = TransformVariables.TransformTuple{NamedTuple{(:p,), Tuple{TransformVariables.ScaledShiftedLogistic{Float64}}}}((p = as𝕀,), 1)), 3)
julia> logdensity(m2() | (; m = (; x = rand(3))), t(randn(3)))
-1.6705301923419098
In this issue, @trappmartin suggests models roughly equivalent to
This currently fails in Soss:
The challenge here is that the
m1
model is not "sampled" or "observed", but instead a little of both.