RFC: decouples Turing compiler and inference methods.

yebai commented 6 years ago

Discussion - TBA.

Related issues:

Turing Interface Design #104
Decoupling HMC/NUTS from Turing.VarInfo https://github.com/TuringLang/Turing.jl/issues/431

Will's notes on syntax and compiler

https://github.com/TuringLang/Turing.jl/tree/wct/refactor-core/docs/src

willtebbutt commented 6 years ago

Thanks for opening this @yebai . Some of my comments at the end of the compiler note are a bit outdated in light of conversations we've had since. In particular the stuff relating to the variable naming / storage scheme, if we choose to enforce unique variable names at model scope.

xukai92 commented 6 years ago

In the meeting last time, we generally agree to decouple Turing.jl into 3 modules, namely Turing.Core, Turing.Modelling and Turing.Inference, of which the corresponding responsibility are:

Turing.Core: a set of helper functions that glue the modelling and inference parts of Turing.jl.
Turing.Modelling: mainly the compiler, which transfers models to a set of functions which required by inference.
Turing.Inference: currently the samplers and in the future we will probably include some other inference methods like MAP or variational inference.

Before starting working on the refactoring, we'd like to design & fix the API between three modules. For this purpose, we need to know what is needed for the models and inference methods we want to support (now and in the future). Below are two summary table which I start by what I can come up with now. Please feel free to update this list to reveal the latest discussion.

Inference	Inference requirement	Practical requirement	Note
MH/HMC	density evaluation, gradient evaluation (for HMC)	variable state evaluation (e.g. is transfrom?)	R.v.s are stored as a flatten vector inside Turing.jl thus we need store information for each r.v. so that we can reconstruct the Julia type.
IS/SMC/PG	sampling with part of random variables given in order	trace state maintenance	Replying by order is enough for particle based sampling.
Gibbs	density evaluation with part of random variables given	variable state evaluation (e.g. does belong to sampler?)	This requires replying by name.

If we generate function for MH/HMC which evaluates density by assuming all random variables are in transformed space, we might get rid of variable state evaluation for HMC.
It might be possible to do the same for the state evaluation for Gibbs, but this means we need to generate new functions for each Gibbs step because the random variables which are conditioned are different each time.
- Alternatively we can generate a wrapper doing so, which means we can actually use the same trick for refactoring HMC i.e. by wrapping Turing.jl's internal functions to achieve this.

Model	Model requirement	Practical requirement	Note
Non-parametric			@yebai had an implementation of `Turing.IArray` (infinite array) in an early version of Turing.jl
w/ stochastic control flow

I don't know much on this side. @yebai @emilemathieu Can you help me fill in this?

On internal typing

Another thing I forget to mention is that, some part of the complexity is from the difference of Turing.jl's internal variable type and Julia's native variable type. Comparing Stan's type design with Turing.jl and Julia:

Stan	Turing	Julia
real/int	-	Float
vector/row_vector/matrix	-	Array{Float,1 or 2}
array	-	Array{Union{Float, Array{Float,1 or 2}}, 1 or 2}
constrained	-	-

I omitted Turing.jl's because they are currently stored as plain vector and transformed back to corresponding Julia type whenever necessary. Note that the design choice of flatting all variables is
due to consideration of performance, as it reduces internal computation in samplers like MH/HMC into simple vector and matrix computation.

A closely related issue is here: https://github.com/TuringLang/Turing.jl/issues/433

On possible ways to make use of modelling/inference assumptions

Use some flag to set the safety level of modelling
Allow users to explicitly define models using @static_model, @dynamic_model or just @model.
Run-time model (code) analysis

On adapting current code for better API as a starting point

I think it's possible to make current codes by wrapping internal functions/with minor changes to follow the decoupling idea. I feel it's a good start point - after we finish this, we can start to replace each isolated module with either refactored codes or a complete new implementation.

xukai92 commented 6 years ago

I added useful information and thoughts on decoupling. Please let me know if there is anything unclear.

xukai92 commented 6 years ago

Below is a list on API design on what we want each module to export, using the example code below. Please feel free to edit it.

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x[1] ~ Normal(m, sqrt(s))
  x[2] ~ Normal(m, sqrt(s))
  return s, m
end
mf = gedemo([1.0, 1.5])

Turing.Core:
Turing.Modelling:
- logpdf(mf, [1.0, 2.0])
- logpdf(mf, Dict(:s=>1.0, m=>2.0))
Turing.Inference:

yebai commented 6 years ago

@xukai92 Thanks, Kai. I'm also working on a note - will post it soon.

@willtebbutt We might be able to adapt most of your design. I'm re-thinking the BNP related requirement and realised that many issues are obsolete now due to recent refactoring.

xukai92 commented 6 years ago

@yebai That's great. I feel it's hard to edit together / point to specific sentence using GitHub actually. E.g. filling the table together or questioning on some ideas others write. Do you think it might be better for us to switch this note on somewhere else?

xukai92 commented 6 years ago

Some more notes which might be helpful

Current `VarInfo` fields explained

idcs: a dictionary maps each variable (in type of VarName) to a variable ID, which will be the corresponding index of the variable in other fileds; this allows all other filed to simply become vectors.
vns: a vector of variable names
vals: a plain vector which is simply a concatenation of all random variables which are flatten
- We assume the order of concatenation is the order of corresponding assume() call for each variable.
ranges: a vector of ranges which helps map each variable to the corresponding dimensions in vals
rvs: this is not used currently; it was introduced to allow temporarily store variables in original Julia types but was abandoned later
dists: a vector of distribution type; this provides information on constraints for each variable and its shape
gids: a vector of group IDs used to indicate which sampler the corresponding variable belongs to
logp: to store evaluated log-joint probability
pred: a Dict{Symbol,Any} used as a container for outputing
- Without checking the code I feel we might be able to get rid of this field by better APIs
num_produce: number of observe statements called
orders: observe statements orders associated with random variables
flags: used to indicate flags like is deleted or not and is transformed or not
- This is of Dict{String,Vector{Bool}} which supports added more flags without changing the fields.
- This design can possibly improved by metaprogramming which keeps the backend coding as simple as now and improve efficiency.

Some APIs needed

logpdf(vars): joint-probability of model variables and data, using replaying-by-name
logpdf(theta): joint-probability of model variables and data, using replaying-by-order
is_transform(var): check if a variable var is in the transformed space or not
- Before each HMC step, we need to transform variables to unconstrained space.
  - If we only perform HMC alone, the transformation only needs to be done twice - beginning and ending of the sampler.
  - If we use HMC along with other samplers that work in constraint space, we need to do this each time switching between samplers.
- It might be possible to simplify/get rid of it by assuming all variables during HMC is transformed if transformable, if we do not use a flatten vectorization of all variables (i.e. we store there original Julia type so that ranges and dists are not required).
get_sampler(var): return the sampler ID of the corresponding variable var

willtebbutt commented 6 years ago

Naming proposal: "atomic distribution" - a distribution not implemented using the @model macro. "compound distribution" - anything created using the @model macro. May comprise atomic distributions and compound distributions, which I will refer to as "component distributions".

Response to @xukai92's stuff above. I broadly agree with what you've said, but have a few comments:

Regarding the requirements for various inference algorithms, could you please elaborate a bit on the differences between IS / SMC / PG and Gibbs? I'm not quite clear what the difference between "sampling with part of random variables given in order" and "density evaluation with part of random variables given" is.
Regarding the HMC interface, I think the best move is to allow samplers to express things like "I expect all of my variables in a fixed-length vector and I'm going to assume that they are unconstrained", and let Turing.Core take care of squaring that requirement with whatever model is provided it in a generic manner as there are plenty of other inference algorithms that might want to require this kind of condition. It would be worth giving some thought to how we want to implement this, as we presumably want to be able to separately specify things like "must be fixed dimensionality" and "must be unconstrained" separately. As regards the satisfaction of these particular constraints, it should definitely be possible to determine at compile time whether or not the dimensionality of a model might change via a recursive procedure that knows whether or not each atomic distribution has fixed dimensionality, and checks for stochastic control flow. Similarly for constraints, Core should be (presumably already is) equipped with a transformation for each type of distribution, to take it into an unconstrained space. Then a procedure for deducing whether or not it's possible to transform a composite distribution into an unconstrained space is simple (pseudo-code):
```
function can_be_unconstrained(model_type::Type{<:Composite})
for component_type in component_types(model_type)
    can_be_unconstrained(component_type) || return false
end
return true
end
can_be_constrained(d::Type{<:Distributions.Normal}) = true
can_be_constrained(d::Type{SomethingNotUnconstrainable}) = false
```
(I'm assuming that we already have something like this somewhere?)
I'm pro generating a wrapper for each Gibbs step. So for a given model d and a collection of samplers s1, s2, ..., sN, we generate N separate "views" into the same model, which are themselves composite distributions in which treat a subset of d component distributions as fixed parameters and the rest as random variables. This should make it possible to perform some static analysis to get rid of the overhead associated with knowing which RVs to update. I'm not sure how this will play with nonparametrics...
Internal typing: I'm not clear on exactly what you mean here by I omitted Turing.jls...`. Is the type information in the wrong column? I'll comment on #433 separately regarding this discussion.
On possible ways to make use of modelling / inference assumptions: things like whether or not the dimensionality of the model is fixed, and whether it is possible to map it into an unconstrained space, should be possible to deduce statically (as discussed above), thus I would propose to expose functions like can_be_unconstrained, has_fixed_dims (exact names tbd) which are available to the user, so that they can verify whether or not their model has the kinds of properties that they think it should, and so that we can use these functions internally to ensure that e.g. a particular sampler is compatible with a particular model. (As a side note, we should also provide debugging tools on this front that make it easier to figure out which components of your model have, for example, fixed dimensionality). I can't think of any properties that can't be deduced statically, but it's perfectly possible that I'm missing something.
On adapting current code for better API as a starting point: I completely agree.
As regards the API, I'm still very pro not passing the data in as an argument because of the implications for model reusability, but I might be in a minority here I guess...

xukai92 commented 6 years ago

Naming proposal: "atomic distribution" - a distribution not implemented using the @model macro. "compound distribution" - anything created using the @model macro. May comprise atomic distributions and compound distributions, which I will refer to as "component distributions".

Good idea!

Regarding the requirements for various inference algorithms, could you please elaborate a bit on the differences between IS / SMC / PG and Gibbs?

In short SMC is a sequential version of IS and PG (or conditional SMC) is running MCMC in a way each step is a SMC + resampling.

I'm not quite clear what the difference between "sampling with part of random variables given in order" and "density evaluation with part of random variables given" is.

Sorry I was not clear here. So given a probabilistic program whose random variables sampled in order v_1, v_2, v_3, ..., v_n, by sampling with part of random variables given in order I mean: k random variables from the beginning, i.e. v_{1:k} are given, and we want to continue the probabilistic program from that state. In contrast, density evaluation with part of random variables given means I treat some random variables as data, say v_1 and v_n, and I want to provide the values of the rest to evaluate the density, which is the wrapper thing we talked for Gibbs.

About the can_be_unconstrained function, I'm not sure what's it for - we currently assume all distributions can be transformed into unconstrained space actually. The problem I was point out is how to check such a random variable is currently in its transformed space or original space, in runtime, because different samplers requires them in different spaces.

willtebbutt commented 6 years ago

Aha, I see. Both situations should be covered by providing wrappers for the proposed wrappers. Actually, it strikes me that we should actually be able to compile completely different logpdf and rand functions for each step of Gibbs sampling, which just have the bits in that we need to be able to update (but for now we could just wrap the same underlying function).

About the can_be_unconstrained function, I'm not sure what's it for - we currently assume all distributions can be transformed into unconstrained space actually.

Good to know. What do we currently do about, for example, discrete RVs? My point was generally that there are various properties of a model that are required for different samplers, and I suspect that for any given model we should be able to automatically deduce whether or not any particular property holds. Consequently, there's not really a need for an @static_model or @dynamic_model macro.

The problem I was point out is how to check such a random variable is currently in its transformed space or original space, in runtime, because different samplers requires them in different spaces.

Ah, I see. I hadn't considered that that would be a thing, but it makes sense.

yebai commented 5 years ago

Closed in favour of https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-471339521

TuringLang / Turing.jl