Closed yebai closed 5 years ago
Thanks for opening this @yebai . Some of my comments at the end of the compiler note are a bit outdated in light of conversations we've had since. In particular the stuff relating to the variable naming / storage scheme, if we choose to enforce unique variable names at model scope.
In the meeting last time, we generally agree to decouple Turing.jl into 3 modules, namely Turing.Core
, Turing.Modelling
and Turing.Inference
, of which the corresponding responsibility are:
Turing.Core
: a set of helper functions that glue the modelling and inference parts of Turing.jl.Turing.Modelling
: mainly the compiler, which transfers models to a set of functions which required by inference.Turing.Inference
: currently the samplers and in the future we will probably include some other inference methods like MAP or variational inference.Before starting working on the refactoring, we'd like to design & fix the API between three modules. For this purpose, we need to know what is needed for the models and inference methods we want to support (now and in the future). Below are two summary table which I start by what I can come up with now. Please feel free to update this list to reveal the latest discussion.
Inference | Inference requirement | Practical requirement | Note |
---|---|---|---|
MH/HMC | density evaluation, gradient evaluation (for HMC) | variable state evaluation (e.g. is transfrom?) | R.v.s are stored as a flatten vector inside Turing.jl thus we need store information for each r.v. so that we can reconstruct the Julia type. |
IS/SMC/PG | sampling with part of random variables given in order | trace state maintenance | Replying by order is enough for particle based sampling. |
Gibbs | density evaluation with part of random variables given | variable state evaluation (e.g. does belong to sampler?) | This requires replying by name. |
Model | Model requirement | Practical requirement | Note |
---|---|---|---|
Non-parametric | @yebai had an implementation of Turing.IArray (infinite array) in an early version of Turing.jl |
||
w/ stochastic control flow |
Another thing I forget to mention is that, some part of the complexity is from the difference of Turing.jl's internal variable type and Julia's native variable type. Comparing Stan's type design with Turing.jl and Julia:
Stan | Turing | Julia |
---|---|---|
real/int | - | Float |
vector/row_vector/matrix | - | Array{Float,1 or 2} |
array | - | Array{Union{Float, Array{Float,1 or 2}}, 1 or 2} |
constrained | - | - |
I omitted Turing.jl's because they are currently stored as plain vector and transformed back to corresponding Julia type whenever necessary. Note that the design choice of flatting all variables is
due to consideration of performance, as it reduces internal computation in samplers like MH/HMC into simple vector and matrix computation.
A closely related issue is here: https://github.com/TuringLang/Turing.jl/issues/433
@static_model
, @dynamic_model
or just @model
.I think it's possible to make current codes by wrapping internal functions/with minor changes to follow the decoupling idea. I feel it's a good start point - after we finish this, we can start to replace each isolated module with either refactored codes or a complete new implementation.
I added useful information and thoughts on decoupling. Please let me know if there is anything unclear.
Below is a list on API design on what we want each module to export, using the example code below. Please feel free to edit it.
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return s, m
end
mf = gedemo([1.0, 1.5])
Turing.Core
:Turing.Modelling
:
logpdf(mf, [1.0, 2.0])
logpdf(mf, Dict(:s=>1.0, m=>2.0))
Turing.Inference
: @xukai92 Thanks, Kai. I'm also working on a note - will post it soon.
@willtebbutt We might be able to adapt most of your design. I'm re-thinking the BNP related requirement and realised that many issues are obsolete now due to recent refactoring.
@yebai That's great. I feel it's hard to edit together / point to specific sentence using GitHub actually. E.g. filling the table together or questioning on some ideas others write. Do you think it might be better for us to switch this note on somewhere else?
Some more notes which might be helpful
VarInfo
fields explainedidcs
: a dictionary maps each variable (in type of VarName
) to a variable ID, which will be the corresponding index of the variable in other fileds; this allows all other filed to simply become vectors.vns
: a vector of variable namesvals
: a plain vector which is simply a concatenation of all random variables which are flatten
assume()
call for each variable.ranges
: a vector of ranges which helps map each variable to the corresponding dimensions in vals
rvs
: this is not used currently; it was introduced to allow temporarily store variables in original Julia types but was abandoned laterdists
: a vector of distribution type; this provides information on constraints for each variable and its shapegids
: a vector of group IDs used to indicate which sampler the corresponding variable belongs tologp
: to store evaluated log-joint probabilitypred
: a Dict{Symbol,Any}
used as a container for outputing
num_produce
: number of observe
statements calledorders
: observe
statements orders associated with random variablesflags
: used to indicate flags like is deleted or not and is transformed or not
Dict{String,Vector{Bool}}
which supports added more flags without changing the fields.logpdf(vars)
: joint-probability of model variables and data, using replaying-by-namelogpdf(theta)
: joint-probability of model variables and data, using replaying-by-orderis_transform(var)
: check if a variable var
is in the transformed space or not
ranges
and dists
are not required).get_sampler(var)
: return the sampler ID of the corresponding variable var
Response to @xukai92's stuff above. I broadly agree with what you've said, but have a few comments:
function can_be_unconstrained(model_type::Type{<:Composite})
for component_type in component_types(model_type)
can_be_unconstrained(component_type) || return false
end
return true
end
can_be_constrained(d::Type{<:Distributions.Normal}) = true
can_be_constrained(d::Type{SomethingNotUnconstrainable}) = false
(I'm assuming that we already have something like this somewhere?)
d
and a collection of samplers s1
, s2
, ..., sN
, we generate N
separate "views" into the same model, which are themselves composite distributions in which treat a subset of d
component distributions as fixed parameters and the rest as random variables. This should make it possible to perform some static analysis to get rid of the overhead associated with knowing which RVs to update. I'm not sure how this will play with nonparametrics...I omitted Turing.jl
s...`. Is the type information in the wrong column? I'll comment on #433 separately regarding this discussion.On possible ways to make use of modelling / inference assumptions
: things like whether or not the dimensionality of the model is fixed, and whether it is possible to map it into an unconstrained space, should be possible to deduce statically (as discussed above), thus I would propose to expose functions like can_be_unconstrained
, has_fixed_dims
(exact names tbd) which are available to the user, so that they can verify whether or not their model has the kinds of properties that they think it should, and so that we can use these functions internally to ensure that e.g. a particular sampler is compatible with a particular model. (As a side note, we should also provide debugging tools on this front that make it easier to figure out which components of your model have, for example, fixed dimensionality). I can't think of any properties that can't be deduced statically, but it's perfectly possible that I'm missing something.On adapting current code for better API as a starting point
: I completely agree.Naming proposal: "atomic distribution" - a distribution not implemented using the @model macro. "compound distribution" - anything created using the @model macro. May comprise atomic distributions and compound distributions, which I will refer to as "component distributions".
Good idea!
Regarding the requirements for various inference algorithms, could you please elaborate a bit on the differences between IS / SMC / PG and Gibbs?
In short SMC is a sequential version of IS and PG (or conditional SMC) is running MCMC in a way each step is a SMC + resampling.
I'm not quite clear what the difference between "sampling with part of random variables given in order" and "density evaluation with part of random variables given" is.
Sorry I was not clear here. So given a probabilistic program whose random variables sampled in order v_1
, v_2
, v_3
, ..., v_n
, by sampling with part of random variables given in order I mean: k
random variables from the beginning, i.e. v_{1:k}
are given, and we want to continue the probabilistic program from that state. In contrast, density evaluation with part of random variables given
means I treat some random variables as data, say v_1
and v_n
, and I want to provide the values of the rest to evaluate the density, which is the wrapper thing we talked for Gibbs.
About the can_be_unconstrained
function, I'm not sure what's it for - we currently assume all distributions can be transformed into unconstrained space actually. The problem I was point out is how to check such a random variable is currently in its transformed space or original space, in runtime, because different samplers requires them in different spaces.
Aha, I see. Both situations should be covered by providing wrappers for the proposed wrappers. Actually, it strikes me that we should actually be able to compile completely different logpdf
and rand
functions for each step of Gibbs sampling, which just have the bits in that we need to be able to update (but for now we could just wrap the same underlying function).
About the can_be_unconstrained function, I'm not sure what's it for - we currently assume all distributions can be transformed into unconstrained space actually.
Good to know. What do we currently do about, for example, discrete RVs? My point was generally that there are various properties of a model that are required for different samplers, and I suspect that for any given model we should be able to automatically deduce whether or not any particular property holds. Consequently, there's not really a need for an @static_model
or @dynamic_model
macro.
The problem I was point out is how to check such a random variable is currently in its transformed space or original space, in runtime, because different samplers requires them in different spaces.
Ah, I see. I hadn't considered that that would be a thing, but it makes sense.
Closed in favour of https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-471339521
Discussion - TBA.
Related issues:
Will's notes on syntax and compiler