TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.04k stars 219 forks source link

RFC: decouples Turing compiler and inference methods. #456

Closed yebai closed 5 years ago

yebai commented 6 years ago

Discussion - TBA.

Related issues:

Will's notes on syntax and compiler

willtebbutt commented 6 years ago

Thanks for opening this @yebai . Some of my comments at the end of the compiler note are a bit outdated in light of conversations we've had since. In particular the stuff relating to the variable naming / storage scheme, if we choose to enforce unique variable names at model scope.

xukai92 commented 6 years ago

In the meeting last time, we generally agree to decouple Turing.jl into 3 modules, namely Turing.Core, Turing.Modelling and Turing.Inference, of which the corresponding responsibility are:

Before starting working on the refactoring, we'd like to design & fix the API between three modules. For this purpose, we need to know what is needed for the models and inference methods we want to support (now and in the future). Below are two summary table which I start by what I can come up with now. Please feel free to update this list to reveal the latest discussion.

Inference Inference requirement Practical requirement Note
MH/HMC density evaluation, gradient evaluation (for HMC) variable state evaluation (e.g. is transfrom?) R.v.s are stored as a flatten vector inside Turing.jl thus we need store information for each r.v. so that we can reconstruct the Julia type.
IS/SMC/PG sampling with part of random variables given in order trace state maintenance Replying by order is enough for particle based sampling.
Gibbs density evaluation with part of random variables given variable state evaluation (e.g. does belong to sampler?) This requires replying by name.
Model Model requirement Practical requirement Note
Non-parametric @yebai had an implementation of Turing.IArray (infinite array) in an early version of Turing.jl
w/ stochastic control flow

On internal typing

Another thing I forget to mention is that, some part of the complexity is from the difference of Turing.jl's internal variable type and Julia's native variable type. Comparing Stan's type design with Turing.jl and Julia:

Stan Turing Julia
real/int - Float
vector/row_vector/matrix - Array{Float,1 or 2}
array - Array{Union{Float, Array{Float,1 or 2}}, 1 or 2}
constrained - -

I omitted Turing.jl's because they are currently stored as plain vector and transformed back to corresponding Julia type whenever necessary. Note that the design choice of flatting all variables is
due to consideration of performance, as it reduces internal computation in samplers like MH/HMC into simple vector and matrix computation.

A closely related issue is here: https://github.com/TuringLang/Turing.jl/issues/433

On possible ways to make use of modelling/inference assumptions

  1. Use some flag to set the safety level of modelling
  2. Allow users to explicitly define models using @static_model, @dynamic_model or just @model.
  3. Run-time model (code) analysis

On adapting current code for better API as a starting point

I think it's possible to make current codes by wrapping internal functions/with minor changes to follow the decoupling idea. I feel it's a good start point - after we finish this, we can start to replace each isolated module with either refactored codes or a complete new implementation.

xukai92 commented 6 years ago

I added useful information and thoughts on decoupling. Please let me know if there is anything unclear.

xukai92 commented 6 years ago

Below is a list on API design on what we want each module to export, using the example code below. Please feel free to edit it.

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x[1] ~ Normal(m, sqrt(s))
  x[2] ~ Normal(m, sqrt(s))
  return s, m
end
mf = gedemo([1.0, 1.5])
yebai commented 6 years ago

@xukai92 Thanks, Kai. I'm also working on a note - will post it soon.

@willtebbutt We might be able to adapt most of your design. I'm re-thinking the BNP related requirement and realised that many issues are obsolete now due to recent refactoring.

xukai92 commented 6 years ago

@yebai That's great. I feel it's hard to edit together / point to specific sentence using GitHub actually. E.g. filling the table together or questioning on some ideas others write. Do you think it might be better for us to switch this note on somewhere else?

xukai92 commented 6 years ago

Some more notes which might be helpful

Current VarInfo fields explained

Some APIs needed

willtebbutt commented 6 years ago

Response to @xukai92's stuff above. I broadly agree with what you've said, but have a few comments:

xukai92 commented 6 years ago

Naming proposal: "atomic distribution" - a distribution not implemented using the @model macro. "compound distribution" - anything created using the @model macro. May comprise atomic distributions and compound distributions, which I will refer to as "component distributions".

Good idea!

Regarding the requirements for various inference algorithms, could you please elaborate a bit on the differences between IS / SMC / PG and Gibbs?

In short SMC is a sequential version of IS and PG (or conditional SMC) is running MCMC in a way each step is a SMC + resampling.

I'm not quite clear what the difference between "sampling with part of random variables given in order" and "density evaluation with part of random variables given" is.

Sorry I was not clear here. So given a probabilistic program whose random variables sampled in order v_1, v_2, v_3, ..., v_n, by sampling with part of random variables given in order I mean: k random variables from the beginning, i.e. v_{1:k} are given, and we want to continue the probabilistic program from that state. In contrast, density evaluation with part of random variables given means I treat some random variables as data, say v_1 and v_n, and I want to provide the values of the rest to evaluate the density, which is the wrapper thing we talked for Gibbs.

About the can_be_unconstrained function, I'm not sure what's it for - we currently assume all distributions can be transformed into unconstrained space actually. The problem I was point out is how to check such a random variable is currently in its transformed space or original space, in runtime, because different samplers requires them in different spaces.

willtebbutt commented 6 years ago

Aha, I see. Both situations should be covered by providing wrappers for the proposed wrappers. Actually, it strikes me that we should actually be able to compile completely different logpdf and rand functions for each step of Gibbs sampling, which just have the bits in that we need to be able to update (but for now we could just wrap the same underlying function).

About the can_be_unconstrained function, I'm not sure what's it for - we currently assume all distributions can be transformed into unconstrained space actually.

Good to know. What do we currently do about, for example, discrete RVs? My point was generally that there are various properties of a model that are required for different samplers, and I suspect that for any given model we should be able to automatically deduce whether or not any particular property holds. Consequently, there's not really a need for an @static_model or @dynamic_model macro.

The problem I was point out is how to check such a random variable is currently in its transformed space or original space, in runtime, because different samplers requires them in different spaces.

Ah, I see. I hadn't considered that that would be a thing, but it makes sense.

yebai commented 5 years ago

Closed in favour of https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-471339521