TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.04k stars 219 forks source link

RFC Sampler type #771

Closed xukai92 closed 5 years ago

xukai92 commented 5 years ago

We currently use spl.info which is a dictionary and we would like to remove it. There are a few ways:

Related issues: https://github.com/TuringLang/Turing.jl/issues/602

mohamed82008 commented 5 years ago

Not exactly related to your specific question of sampler vs info specialization, but I wrote the following to summarize my thoughts on separating the compiler and inference parts of Turing in general, #634 is relevant. FWIW, I think sampler specialization is probably more neat but info specialization is less hectic because currently we dispatch on Sampler in a lot of places to basically refer to any AbstractSampler that is not SampleFromPrior() or SampleFromUniform(). I haven't read the AHMC PR yet so I may have further thoughts after reading it.

On the separation of compiler and inference, currently we have the following components on Turing's inference side:

  1. InferenceAlgorithm: a minimal and immutable representation of the inference algorithm and its hyperparameters, no implementation-specific fields here.
  2. Sampler: an extended representation of the inference algorithm that has all the information and variables needed during the sampling process, i.e. to "implement" the inference algorithm. These additional implementation-specific information are currently selector and info. So InferenceAlgorithms are part of the exported API but Sampler stuff are implementation details. Since we are putting most of the implementation-specific temporaries inside Sampler, shouldn't VarInfo(s) and model also be part of it? Does it make sense to export Sampler so it can be used as a way to pre-allocate memory by the user? Then we can call sample(spl) to sample from it.
  3. sample: a function that given a model instance and an InferenceAlgorithm, creates the temps Sampler and VarInfo(s), then a significant part of the inference algorithm is actually implemented here. Other parts of the inference algorithm are spread across assume and observe which dispatch on the sampler type. One observation here is that in sample, observe and assume there are common Turing-specific functionality to do with Sampler, VarInfo, model and Sample bookkeeping, and there are other algorithmic implementation details/semantics, both of which are mixed together in the 3 different places (sample, assume and observe). Separating the inference and bookkeeping requires a careful consideration of what each algorithm requires from Turing side. For example, HMC may require the flattening of the parameters and a way to calculate the log joint probability of some parameters but no special requirements for assume and observe. It is unclear to me how things like MH and particle samplers may be separated out though since they rely on custom assume and observe so the algorithm is not fully implemented without tampering with these somewhat internal functions that use a fair bit of compiler-jargon like VarName and VarInfo. Perhaps one way to separate the compiler-jargon from the inference algorithm semantics is to merge all the compiler-jargon together and only expose documented functions that make sense to the inference algorithm writer. These can then make it easier to write simple and readable custom sample, observe and assume methods for the algorithm in hand. So we can merge vi::VarInfo with spl::Sampler and model together with the currently considered/active vn::VarName, then the "algorithm writer" can say something like push!(spl::MergedSampler, dist, r) when they want to register a new value and distribution for the active vn in vi. Behind the scenes, we can then take care of things like checking if vn already exists in vi and if it does then checking if it is flagged "del", etc.

I think an important question we need to answer here is where should each struct and function live, Turing or external packages? Let's take the most complicated case, where we need a sample, observe and assume for the new sampler. If we are to overload a Turing function or subtype Turing.AbstractSampler in external packages, we need these packages to depend on Turing. But then we also need to have using Turing make all these algorithms from external packages available. So I think this is a case where having something like TuringBase can be useful. TuringBase will have the abstract types that should be subtyped, e.g. InferenceAlgorithm, and the functions that can be overloaded by algorithm writers, e.g. sample, assume and observe. Turing can then depend on the external packages which implement specific algorithms. So Turing becomes this metapackage that has multiple packages under its umbrella. If we decide to have all the algorithms implemented inside of Turing but in different submodules or files, then Turing.Core can play the role of TuringBase.

So given the above, I think one thing that can be helpful in order to have a more modular Turing is to combine all the compiler-jargon into a single variable, and define functions for common operations performed in sample, observe and assume of all the algorithms. This makes it easier to write, debug and document new and existing inference algorithms. It also creates a clear barrier between Turing.Core and algorithm semantics. Let's say an algorithm writer wants to do xyz on spl. He/she can ask, "Is there a function that does xyz on spl?" If not, someone on the core team can define, document and export this function. The algorithm writer can then happily use this function in their implementation.

The above is probably in line with the thinking in #634 with the main difference being that rand is proposed to be used in place of sample IIUC.

cpfiffer commented 5 years ago

I had some similar thoughts here as well.

yebai commented 5 years ago

@mohamed82008 There are many good points in your comments. I have been thinking about some similar ideas. I think we need to carefully re-design some core APIs to make the boundary between Turing and its external world (e.g. AdvancedHMC, RandomMeasures, GaussianProcesses) clearer. This will take a few weeks time I think, in particular, the following 3 PRs will pave the way for next step improvements:

At the moment, these PRs still needs some effort for separating, simplifying and documenting APIs and internal functions.

As a side note, we might need a more thorough code reviewing process and a few more team hackathons. Fortunately, the Turing code base is relatively small. So with time, we can turn Turing into a swiss knife style library for probabilistic machine learning and Bayesian statistics.

trappmartin commented 5 years ago

@mohamed82008 Most of what you wrote is pretty much in the lines of what I'm currently trying. Once we have the PRs ready and merged we will probably need a few more rounds but I feel we are going in the right direction.

@yebai Swiss knife style library for probabilistic machine learning sounds awesome. This would be a great selling point for Turing. I think the documentation & tutorials of Turing already highlight that we are headed in this direction. Maybe we can put even more efforts in the tutorials once in a while and showcase the wide application of Turing to various domains and task.

yebai commented 5 years ago

Duplicate of https://github.com/TuringLang/Turing.jl/issues/746#issuecomment-489387186