brian-j-smith / Mamba.jl

Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia
Other
254 stars 52 forks source link

Adding sets of variables to a model #45

Closed nowozin closed 8 years ago

nowozin commented 8 years ago

Thanks for writing the Mamba package and its excellent documentation!

I would like to define sets of nodes each of which depends on a single other indexed node. Using the current Mamba API this translates into something like

T = Stochastic(1, @modelexpr(g, M, Distribution[Normal(g[i], 5.0) for i=1:M]), false),

Here M is the number of nodes, and g has the same dimension and length. However, it seems for large M this becomes horribly inefficient to sample from, and I believe this is because Mamba recreates a large Distribution array each time the stochastic node is sampled.

Instead, in an ideal world, I would like to be able to do something like:

model = Model()
for i=1:M
    push!(model, IndexedStochastic(:g, i, :Uniform(1.0, 5.0), false))
    push!(model, IndexedStochastic(:T, i, @modelexpr(model, :(g[i]), Normal(g[i], 5.0), false)))
end

This way Mamba knows that T does not densely depend on g, and we can still index T[i] and g[i] nicely once MCMC inference has completed. Each T[i] and g[i] would be separately sampled.

The goal of the above design would be to increase sampling efficiency for arrays of random variables by exposing additional dependency structure useful for sampling, but there may be a much better way to do the same thing in Mamba, the design of which I am largely ignorant of.

Context: In many models popular in machine learning we would like to define sets of random variables in a flexible manner. This is where the "plate notation" in directed graphical models and factor graphs is commonly used to define such sparse conditional dependencies as in the example above.

bdeonovic commented 8 years ago

Not exactly sure about the details of your model, but have you considered using multivariate normal? Here is a small example from the docs:

  y = Stochastic(1,
    @modelexpr(mu, s2,
      MvNormal(mu, sqrt(s2))
    ),
    false
  ),

  mu = Logical(1,
    @modelexpr(xmat, beta,
      xmat * beta
    ),
    false
  ),

  beta = Stochastic(1,
    :(MvNormal(2, sqrt(1000)))
  ),

  s2 = Stochastic(
    :(InverseGamma(0.001, 0.001))
  )

)
brian-j-smith commented 8 years ago

Hi @nowozin,

You've astutely pinpointed the inefficiency of using arrays when simpler relationships exist among the elements. As you also correctly note, Mamba only knows about relationships at the node level, and thus evaluates the entire node (all array elements) when any related node is sampled.

One solution is to break the array into individual nodes. Below is one way to do that with the current release of Mamba. (The sigma2 node in the example is not part of your original question and only included for illustrative purposes.) This may not quite be what you are looking for, because the stochastic node names will be g1, g2, ..., gM and T1, T2, ..., TM and not indexable as arrays. If it is important to index them as such in other parts of the model specification, logical nodes could be created to collect them into arrays. Specification of initial values and samplers would still have to be done using the stochastic node names. A bit of programming could be done to help automate that too. Let me know if you have any questions about this approach.

Another solution is to have more efficient multivariate samplers in the package. That is moving up on my list of priorities but may not be something that I can get to for a while.

## Dictionaries to store automatically generated nodes
gnodes = Dict{Symbol,Any}()
Tnodes = Dict{Symbol,Any}()

## Automatically generate M sets of nodes
M = 3
for i in 1:M
  global gi = symbol(string("g", i))
  global Ti = symbol(string("T", i))
  gnodes[gi] = Stochastic(:(Uniform(1.0, 5.0)), false)
  @eval begin
    Tnodes[:($Ti)] = Stochastic(
      @modelexpr($gi, Normal($gi, 5.0)),
      false
    )
  end
end

## Model specification (note the semicolon, ellipses, and commas)
model = Model(;
  gnodes...,
  Tnodes...,
  sigma2 = Stochastic(:(InverseGamma(0.001, 0.001)), false)
)
nowozin commented 8 years ago

Thanks Brian for confirming and for the workaround. I think the workaround will work for me, possibly with some helper methods to allow me to perform indexed accesses.

Overall I believe that arrays of random variables with fixed and sparse dependencies between them are quite important in many applications. One popular example is Latent Dirichlet Allocation where such indexing is needed.