Closed xukai92 closed 5 years ago
Not exactly related to your specific question of sampler vs info specialization, but I wrote the following to summarize my thoughts on separating the compiler and inference parts of Turing in general, #634 is relevant. FWIW, I think sampler specialization is probably more neat but info specialization is less hectic because currently we dispatch on Sampler
in a lot of places to basically refer to any AbstractSampler
that is not SampleFromPrior()
or SampleFromUniform()
. I haven't read the AHMC PR yet so I may have further thoughts after reading it.
On the separation of compiler and inference, currently we have the following components on Turing's inference side:
InferenceAlgorithm
: a minimal and immutable representation of the inference algorithm and its hyperparameters, no implementation-specific fields here.Sampler
: an extended representation of the inference algorithm that has all the information and variables needed during the sampling process, i.e. to "implement" the inference algorithm. These additional implementation-specific information are currently selector
and info
. So InferenceAlgorithm
s are part of the exported API but Sampler
stuff are implementation details. Since we are putting most of the implementation-specific temporaries inside Sampler
, shouldn't VarInfo
(s) and model
also be part of it? Does it make sense to export Sampler
so it can be used as a way to pre-allocate memory by the user? Then we can call sample(spl)
to sample
from it.sample
: a function that given a model instance and an InferenceAlgorithm
, creates the temps Sampler
and VarInfo
(s), then a significant part of the inference algorithm is actually implemented here. Other parts of the inference algorithm are spread across assume
and observe
which dispatch on the sampler type. One observation here is that in sample
, observe
and assume
there are common Turing-specific functionality to do with Sampler
, VarInfo
, model
and Sample
bookkeeping, and there are other algorithmic implementation details/semantics, both of which are mixed together in the 3 different places (sample
, assume
and observe
). Separating the inference and bookkeeping requires a careful consideration of what each algorithm requires from Turing side. For example, HMC may require the flattening of the parameters and a way to calculate the log joint probability of some parameters but no special requirements for assume
and observe
. It is unclear to me how things like MH and particle samplers may be separated out though since they rely on custom assume
and observe
so the algorithm is not fully implemented without tampering with these somewhat internal functions that use a fair bit of compiler-jargon like VarName
and VarInfo
. Perhaps one way to separate the compiler-jargon from the inference algorithm semantics is to merge all the compiler-jargon together and only expose documented functions that make sense to the inference algorithm writer. These can then make it easier to write simple and readable custom sample
, observe
and assume
methods for the algorithm in hand. So we can merge vi::VarInfo
with spl::Sampler
and model
together with the currently considered/active vn::VarName
, then the "algorithm writer" can say something like push!(spl::MergedSampler, dist, r)
when they want to register a new value and distribution for the active vn
in vi
. Behind the scenes, we can then take care of things like checking if vn
already exists in vi
and if it does then checking if it is flagged "del", etc.I think an important question we need to answer here is where should each struct and function live, Turing or external packages? Let's take the most complicated case, where we need a sample
, observe
and assume
for the new sampler. If we are to overload a Turing function or subtype Turing.AbstractSampler
in external packages, we need these packages to depend on Turing. But then we also need to have using Turing
make all these algorithms from external packages available. So I think this is a case where having something like TuringBase
can be useful. TuringBase
will have the abstract types that should be subtyped, e.g. InferenceAlgorithm
, and the functions that can be overloaded by algorithm writers, e.g. sample
, assume
and observe
. Turing can then depend on the external packages which implement specific algorithms. So Turing becomes this metapackage that has multiple packages under its umbrella. If we decide to have all the algorithms implemented inside of Turing but in different submodules or files, then Turing.Core can play the role of TuringBase.
So given the above, I think one thing that can be helpful in order to have a more modular Turing is to combine all the compiler-jargon into a single variable, and define functions for common operations performed in sample
, observe
and assume
of all the algorithms. This makes it easier to write, debug and document new and existing inference algorithms. It also creates a clear barrier between Turing.Core and algorithm semantics. Let's say an algorithm writer wants to do xyz
on spl
. He/she can ask, "Is there a function that does xyz
on spl
?" If not, someone on the core team can define, document and export this function. The algorithm writer can then happily use this function in their implementation.
The above is probably in line with the thinking in #634 with the main difference being that rand
is proposed to be used in place of sample
IIUC.
@mohamed82008 There are many good points in your comments. I have been thinking about some similar ideas. I think we need to carefully re-design some core APIs to make the boundary between Turing and its external world (e.g. AdvancedHMC
, RandomMeasures
, GaussianProcesses
) clearer. This will take a few weeks time I think, in particular, the following 3 PRs will pave the way for next step improvements:
VarInfo
https://github.com/TuringLang/Turing.jl/pull/742At the moment, these PRs still needs some effort for separating, simplifying and documenting APIs and internal functions.
As a side note, we might need a more thorough code reviewing process and a few more team hackathons. Fortunately, the Turing code base is relatively small. So with time, we can turn Turing into a swiss knife style library for probabilistic machine learning and Bayesian statistics.
@mohamed82008 Most of what you wrote is pretty much in the lines of what I'm currently trying. Once we have the PRs ready and merged we will probably need a few more rounds but I feel we are going in the right direction.
@yebai Swiss knife style library for probabilistic machine learning sounds awesome. This would be a great selling point for Turing. I think the documentation & tutorials of Turing already highlight that we are headed in this direction. Maybe we can put even more efforts in the tutorials once in a while and showcase the wide application of Turing to various domains and task.
We currently use
spl.info
which is a dictionary and we would like to remove it. There are a few ways:Sampler
types likeHMCSampler
,PGSampler
, orHMCInfo
,PGInfo
, etc types and set them asspl.info
Related issues: https://github.com/TuringLang/Turing.jl/issues/602