Making MCMC.jl more flexible and portable in terms of model specification

papamarkou commented 10 years ago

@LaurenceA, your suggestion in #34 to somehow allow use of MCMCChain and samples with external models is very useful. I want to work towards this direction and my first impression is that your suggestion is feasible and attainable. I would like to ask you for some input, from a user's perspective in order to see how exactly I am going to achieve the end goal.

Let's say that you have a matrix of Monte Carlo samples that you have somehow computed. This matrix will populate the MCMCChain.samples field. You may (or may not) have the relevant scores (grad of the log-likelihood for instance) stored in a separate matrix, which could populate the MCMCChain.gradients field. Assuming this situation, I would like to ask you the following:

you will also need to have a task defined, since this is one of the main components of the MCMC package. There is a reason of course for this, since tasks make it much easier to handle the simulations. This task needs in turn a triplet (model, sampler, runner) as its fields. The sampler and runner are easy to be instantiated using the existing samplers and runners. What you want is to define your own model rather than using the likelihood model of the MCMC package. Do I understand correct what you are trying to achieve?
If so, you would have to define this model as a subtype of MCMCModel or else we would have to simply change the MCMCTask.model field so as to be a subtype of Model instead of MCMCModel. The main question is what kind of statistical model you want to define. This is a more general question, since one may want to input here a model from GLM or from any other Julia package. @fredo-dedup, @johnmyleswhite, @dmbates, @andreasnoackjensen, @lindahua I am copying you in here so that you follow up the discussion (and so that you contribute if you have some ideas). If you can all give me an idea of possible models and situations that one may want to use with MCMC then I can start extending the codebase in MCMC to better work with external models.

papamarkou commented 10 years ago

I just realized I should also reference #20 here due to its direct relevance.

LaurenceA commented 10 years ago

I raised the original issue because I didn't realise that ess should be able to take a Vector, so I thought I needed to populate an MCMCChain to get access to your summary statistics.

So I'm not sure what it would give me if I were able to populate an MCMCChain with a Vector/Matrix of samples. Is there any functionality that shouldn't be available for a Vector of samples?

On Sun, Jan 26, 2014 at 11:30 AM, Theodore Papamarkou < notifications@github.com> wrote:

I just realized I should also reference #20https://github.com/JuliaStats/MCMC.jl/issues/20here due to its direct relevance.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/MCMC.jl/issues/35#issuecomment-33314612 .

papamarkou commented 10 years ago

As far as the MCMC summary stats go, you have access to all the relevant functions by using vectors or matrices after yesterday's commit indeed.

It depends on what you are trying to achieve. For example all the samplers are now well integrated with MCMCChain, so they are of little use if you don't assign your model to the MCMCChain.task.model field. The whole point is that you can easily import MCMC.MCMCModel, then define your model as a subtype of MCMCModel, and then you can exploit the functionality of MCMC in terms of running its samplers. Does this make sense or I would need to elaborate some more?

johnmyleswhite commented 10 years ago

Sadly, I'm way too removed from this package's development to really give much trustworthy insight. But I think it would be extremely valuable to build a set of simple MCMC functions that work using only very standard Julia types like Vector{Float64} for parameters and Function for the log posterior. The slice sampler code that got merged recently was of that sort and would make it very easy to run MCMC without learning about the internals of this package. Right now, I don't often use this package because I don't know to call the elementary functions in it without working with the more complex machinery implied by the BUGS-like DSL.

It actually might be worth trying to build up an implementation that has two packages with fully orthogonal design: one package that runs standard samples on Vector{Float64} and another package that translates symbolically described models into the format used by the lower-level package. Right now I have the sense that things are fairly tightly coupled, which makes it hard for me (who doesn't get much time to look into things deeply) to keep up with the discussion.

papamarkou commented 10 years ago

Your remarks make sense to me John. I think that the current API offers flexible management of MCMC simulations by relying on tasks yet at the same time can be rather complicated to use. A first obvious improvement is to write up a short tutorial-user guide.

I agree that we should somehow decouple the two different levels of implementation you mentioned - some work has started towards this direction recently by trying to migrate the autodiff code to a standalone package (@fredo-dedup leads this and it will be ready within the next few days). I will put some careful thought on your recommendation for further compartmentalization, which intiuitively seems to be a reasonable idea, so as to see how it could be achieved in practice.

johnmyleswhite commented 10 years ago

Yes, splitting off Autodiff be a huge gain. Miles and I are both really interested in that code.

Compartmentalization, if you can design it right, is usually a big gain. But I'm just a bystander to this package, so don't place too much weight on my opinions.

papamarkou commented 10 years ago

I think that autodiff is going to be very useful indeed, and I am thankful too to @fredo-dedup for his reverse mode coding (we had already the forward mode but it is not so useful in probabilistic settings).

I agree, compartmentalization can be in general a huge gain. As far as the triplet (model, sampler, runner) goes, I think that the sampler and runner parts are rather reasonably set already, but I believe that we can put some thought on how the model part can be made a bit more readily available. Usually the model is simply the target function, from which we draw samples, possibly followed by its gradient, Hessian and derivatives of the Hessian (in general followed by its higher order derivatives). From that point of view, it seems straightforward to have a composite type holding these functions as its fields, which is what we currently do.

So the basic component of the current model is the target distribution. It just occurred to me that the first thing I should do in order to see how this works with external packages is:

use the GLM package as a case study to see how its models can be passed to MCMC,
look at the slice sampler code of yours and the one which is already merged in MCMC,
see how one can simply pass a distribution from Distributions as the model.

Starting with these three specific cases will give me some insight and will be a good lesson as to how to move forward.

papamarkou commented 10 years ago

Ok, I have now a very clear view of how to make the MCMC samplers more accessible, in a way similar to the accessible MCMC summary stats. I will need to take the following actions, which I will complete within 2-3 days unless anything unexpected comes up in terms of unforeseen technicalities:

move the field init from the models to the samplers, so that the samplers can run independently if needed,
write a lower layer of interface function signatures for the samplers, so that they can then work by simply passing the required functions to them as arguments. At a higher level, it will be still possible to utilize the existing MCMC interface based on the triplet (model, sampler, runner).
I already allowed for the creation of models by passing a distribution from Distributions and then invoking logpdf and gradloglik. This works with Random Walk Metropolis, it needs a little bit of debugging to work with the rest of the samplers, but I know where the problem is (I think!), so I will fix it by tomorrow once I find some free time.

fredo-dedup commented 10 years ago

Thanks for undertaking this. Your approach sounds good.

I see one minor problem with initial values moving from the model to the sampler : the init parameter was useful for AD for models described by the DSL, which is why I went to associate it with the model (the init values could be specified once for both AD and initial sampling point). But I understand that from a user standpoint this is not always logical : typically the starting point of the sampling should be independant of the model, for example when starting several chains. So you can go this way.

If possible, may be you could keep initial values as an optional field of the model and then have the samplers use those values in case they are called without initial values at their level. That would allow both behaviors to coexist.

papamarkou commented 10 years ago

Sure, that's a clever idea Frederic, thanks for coming up with it, I will follow your suggestion.

papamarkou commented 10 years ago

I spent quite a while thinking about this topic. The main a difference between a DSL implementation and a non-DCL implementation similar to the existing slice sampler is not the (model, sampler, triplet) interface... the main difference is the use of tasks as part of the DSL implementation (and consequently the use of the relevant task management functions, such as consume and produce). I personally very much favour the DSL implementation and if it were only for me I would stick to it. At the same time, just because I received several comments from users asking for an interface without tasks, similar to the slice sampler as it stands, I am trying to make these two coexist. I have two completely approaches in mind:

implement the non-DSL approach without using tasks (yak... tasks are really handy, why discard this functionality...?),
implement the non-DSL interface at a higher level, which will be a wrapper around the DSL approach. This means that the slice sampler for instance will be called with its existing arguments, it will then create a (model, sampler, runner) triplet without the user "knowing or worrying" about it and it will simply return an MCMC chain.

There are many reasons to go for the second option (I can elaborate on these reasons if you want me to). I just wanted to see if there are any reactions and if not, then after a couple of days, I will move on with the coding of this idea...

LaurenceA commented 10 years ago

I think I've got a much more concrete idea of what I was thinking about when I talked about how model specification in DSLs should be separated from samplers. This is what I envision for Church.jl:

Church.jl (and possibly your DSL) should not be tightly integrated with specific samplers. Instead, they should provide methods for:

Evaluating log probabilities and their gradients, ideally for subsets of variables.
Storing and updating the state of the Markov Chain.

Now lots of samplers could be written which take only:

An initial state (perhaps of only a subset of variables).
A function from a state (again, perhaps of a subset of variables) to the log probability, and potentially its gradients.

Finally, you just need a bit of glue that allows a user to specify which samplers to apply to which subsets of variables.

This approach has several big advantages:

New DSLs and custom samplers can easily be written that plug in to a large set of already written samplers.
New (potentially very specialised) samplers can easily be written and plugged into a DSL.

With regards to the specific interface for samplers, what would be nice is a function something like this:


next_sample(sampler::Sampler, logp::Function, initial_sample::Tuple)::Tuple

where sampler contains the sampler's parameters. I would want to use next_sample as a component in a bigger sampler. I'd therefore call it many thousands of times, so overhead (from, for instance, creating tasks and model, sampler, runner triplets) would probably be really bad. If you were just calling the same sampler multiple times, it would make alot of sense to wrap calls to next_sample in a task?

In conclusion, I think I'm arguing for your first option?

On Mon, Feb 10, 2014 at 5:50 PM, Theodore Papamarkou < notifications@github.com> wrote:

I spent quite a while thinking about this topic. The main a difference between a DSL implementation and a non-DCL implementation similar to the existing slice sampler is not the (model, sampler, triplet) interface... the main difference is the use of tasks as part of the DSL implementation (and consequently the use of the relevant task management functions, such as consume and produce). I personally very much favour the DSL implementation and if it were only for me I would stick to it. At the same time, just because I received several comments from users asking for an interface without tasks, similar to the slice sampler as it stands, I am trying to make these two coexist. I have two completely approaches in mind:

implement the non-DSL approach without using tasks (yak... tasks are really handy, why discard this functionality...?),

implement the non-DSL interface at a higher level, which will be a wrapper around the DSL approach. This means that the slice sampler for instance will be called with its existing arguments, it will then create a (model, sampler, runner) triplet without the user "knowing or worrying" about it and it will simply return an MCMC chain. There are many reasons to go for the second option (I can elaborate on these reasons if you want me to). I just wanted to see if there are any reactions and if not, then after a couple of days, I will move on with the coding of this idea...

Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/MCMC.jl/issues/35#issuecomment-34660182 .

papamarkou commented 10 years ago

I think several different issues are inter-weaved in our thread, which makes it challenging to disambiguate the end objectives. There is the general software engineering question as to how one may integrate the various Monte Carlo samplers with statistical modelling. There is a chicken-and-egg relationship between the various samplers as to which components will constitute the building blocks of the whole code structure.

There are several approaches to address this question. There is the BUGS and Stan one that are Gibbs- or DAG- centric, and your way of attempting to design the answer classifies in this category. There are pros and cons in your approach, and I overall think that it is a legitimate approach.

This indeed corresponds to option (a) above, and it would require a complete re-versioning of the DSL. Some of its missing components as it stands are i) the combination of different samplers for different subsets of the model parameters as you spotted, ii) a more organized classification of the samplers and iii) more elementary methods such as propose, step, update which would operate on some currently non-existing commonly named fields of the various samplers such as sampler.proposal, sampler.target.

I see your points and they agree with a more longer term roadmap I have in mind for MCMC. Option (b) would serve as a temporary and intermediary quicker fix not addressing the topics you touched.

I am open to discussion, we could arrange at some point with everyone interested an IRC chat since such overarching goals entail prolonged team thinking that can not be effective only via messages. I have some UML diagrams sketched with some thoughts I have made about the stable release I envisage for MCMC.

LaurenceA commented 10 years ago

I agree with your points.

We should think about whether Church.jl and your DSL could be merged - but I'm not sure this is possible. As it stands, Church.jl can work with infinite models, whereas your DSL can do automatic differentiation (possibly up to the hessian). I'm trying to get the gradient by automatic differentiation in Church.jl, but I don't think I'll ever get the hessian (although, perhaps I'm being too pessimistic).

On Tue, Feb 11, 2014 at 1:08 PM, Theodore Papamarkou < notifications@github.com> wrote:

I think several different issues are inter-weaved in our thread, which makes it challenging to disambiguate the end objectives. There is the general software engineering question as to how one may integrate the various Monte Carlo samplers with statistical modelling. There is a chicken-and-egg relationship between the various samplers as to which components will constitute the building blocks of the whole code structure.

There are several approaches to address this question. There is the BUGS and Stan one that are Gibbs- or DAG- centric, and your way of attempting to design the answer classifies in this category. There are pros and cons in your approach, and I overall think that it is a legitimate approach.

This indeed corresponds to option (i) above, and it would require a complete re-versioning of the DSL. Some of its missing components as it stands are i) the combination of different samplers for different subsets of the model parameters as you spotted, ii) a more organized classification of the samplers and iii) more elementary methods such as propose, step, update which would operate on some currently non-existing commonly named fields of the various samplers such as sampler.proposal, sampler.target.

I see your points and they agree with a more longer term roadmap I have in mind for MCMC. Option (b) would server as a temporary and intermediary quicker fix not addressing the topics you touched.

I am open to discussion, we could arrange at some point with everyone interested an IRC chat since such overarching goals entail prolonged team thinking that can not be effective only via messages. I have some UML diagrams sketched with some thoughts I have made about the stable release I envisage for MCMC.

Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/MCMC.jl/issues/35#issuecomment-34752217 .

papamarkou commented 10 years ago

I will try to find some time and look at Church, apart from the possibility of merging and joing efforts it may give a broader view of how our code may evolve...

papamarkou commented 10 years ago

There are several ways to approach the matter of optimizing the underlying Bayesian graphical model... the main problem is that this is an NP-hard problem, so it is a rabbit hole. Assuming for now that the graph is topologically sorted in its user-defined form is a good compromise. I just realized that we should be able to extend functionality towards what you said easily, since @fredo-dedup has already done a lot of work in terms of specifying models as expressions. Unfortunately this is not working right now but it will be up and running soon. We can then take it from there and exploit the existing structure + extend it if needed...

papamarkou commented 10 years ago

What I don't know about the code in src/autodiff/parsing.jl is:

whether the parsing of expressions to models is already separate from autodiff. It may be the case that the user wants to define the model as an expression but not rely on autodiff - for example the Distributions package has now the function gradloglik which gives the score function,
whether the expression can contain a sequence of assignments x ~ Distribution1; y ~ Distribution2(x) in a Gibbs fashion.

Even if all this functionality is not there, I am sure we can easily add it, I am going to make this my priority.

fredo-dedup commented 10 years ago

@scidom, sorry for not keeping you up to date : I am mostly working on the ReverseDiffSource code right now, I am almost completely rewriting the underlying logic (using a graph instead of manipulating expressions) to simplify the package code and produce more optimized results. It may also allow higher order derivatives (already working for R -> R functions). I'll get back to MCMC when I am finished (last time I tried though, it seemed to be working so not sure what is the problem you mention making it non-functional).

About parsing.jl :

yes it can parse an expression and produce a function without using autodiff if the gradient is not needed.
no, it can't produce a loglikelihood with partial parameter updates without recalculating the whole thing. I agree that this would be an interesting problem to investigate as this would allow more sophisticated samplings. The fact that ReverseDiffSource now builds the dependence tree internally might open up this possibility : all model parameters could affected to a group specified by the user, and the dsl parsing would build one function for each group that compute only the necessary branches of the tree. I'll think about it !

papamarkou commented 10 years ago

Thank you very much for working on this @fredo-dedup. Sorry for intervening, I didn't know you have been changing the interface to use graphs internally instead of expressions. Ok, in this case I will let you carry on working on it so that you follow the plan you have in mind. About the points above:

what I had in mind is for the model parser to check if Distributions has a gradloglik for the specified distribution and if yes, then use it, otherwise fall back to autodiff, if you agree.
It is awesome to hear that high order derivatives may be supported.
Yes, it would be great to support partial parameter updates, maybe in the future if it is not easy now.

It was the parsing example in readme that didn't work, but it doesn't matter now, I will let you focus on what you do.

papamarkou commented 10 years ago

@fredo-dedup just to keep you posted since we co-develop the MCMC package; after some recent reading, it has become clear to me that there are 2 steps that need to be taken imo. Firstly, the reverse mode autodiff based on source transformation must be separated from the implementation of MCMC. I know this is sth you have been working on for a while and I will leave it entirely up to you to complete. Once this first step is complete, there is the second step of separating the model logic from the package. As you know already, a declarative model representation is generic and this is exactly the strength of graphs - they can have they own semantics and own generic suite of algorithms. For this reason, the obviously right approach here is to write a second standalone package for probabilistic graphical models, based on the existing Graphs Julia package. For the time being, keep the more primitive node type definition you have in reverse mode autodiff so that you can get the revdiff up and running. Once you push the package to matadata, I will then start separating the probablistic graph logic in a separate package (using the Graphs Julia package), and we can then call this package from MCMC to define our model traversing here. I wanted to double check with you that you are happy with this longer term plan.

papamarkou commented 10 years ago

One further update regarging the input of models in MCMC. I had a long conversation today with one of the creators of Stan, who happens to be a postdoc in the same group where I am currently working. We discussed various aspects of Stan, one of which was the way that models are parsed by Stan (using the Boost Spirit library). The main point point for discussion was imperative model specification (as in Stan) versus a declarative one (as in BUGS). Although I had been in favour of the latter, I can now see that in the context of model parsing for MCMC a declarative approach based on Graphs is most certainly an overkill. I think we should after all stick with an imperative approach, as we have done so far and avoid usage of Graphs as I had previously proposed.

There is only one hugely important point that we do need to improve, following Stan's paradigm. We need to allow the user to define models where the coordinates of each vector can follow a different distribution. The way that Stan parses vectors versus their coordinates is very straightforward. If you write x~N(.,.), then x is a vector, while indices x[i]~N(.,.) refer to the components of a vector. @fredo-dedup I think this is easy to incorporate in our existing model parser and it will offer a huge generalization. May I kindly ask for an update as to where we are standing with model parsing? I am holding back from making any improvements on that end (that would be of use to other users and to me as well), while it's been a couple of months we have the expression parser partially broken as you know :) Do you think that it would be easy to disentangle the reverse mode autodiff you are developing from model parsing? This way you will be able to work on autodiff at your own time without rushing while we make sure that the model parsing development of the MCMC package is not halted or broken for so long (I can work on this).

johnmyleswhite commented 10 years ago

+1 for this kind of separation

FWIW, I personally think the following organization would be great:

MCMCBase: Simple MCMC tools that apply to objects in Base, like Vector.
GraphicalModels: Tools for representing probabilistic (and maybe causal) graphical models
MCMC: A DSL that gets analyzed to generate a graphical model object, a sampling strategy and, if needed, some derivatives of the model's log posterior function.

The main weakness of tools like JAGS and Stan is that they make it so difficult to see what's happening under the hood and modify it at run-time. I suspect you'd find a group of interested MCMCer's who'd be happy to work with Julia instead of JAGS/Stan if they found MCMC.jl had functionality they wanted that was missing from existing tools.

papamarkou commented 10 years ago

This sounds a great, well-structured separation. I think it has gradually become obvious that we need to create standalone APIs for the involved functionality, and the three packages you proposed seem (to me) the way to go.

The Graphs Julia package is a solid basis for coding the GraphicalModels package. I am more naturally included towards declarative modelling too, because it treats model representation as a separate entity with its own semantics, thus making it a generic valuable tool. I can work on this.

Separating MCMCBase from MCMC requires some more careful thinking on how to make this separation, and it is what we should try and do in the very near future.

lindahua commented 10 years ago

From a broader perspective, I think it would be useful to separate model specification and inference algorithms. MCMC is an important/big family of inference methodologies, however, there do exist other important methodologies for probabilistic model inference and estimation, which, for example, include variational inference, belief propagation, etc.

I suggest that we should completely separate model specification & DSL into a separate package, which other inference methods can also utilize. This package can focus on MCMC algorithms.

johnmyleswhite commented 10 years ago

Yes, I agree with that. At a minimum, you'd want a type that encodes just a graphical model with its dependency structure and then another type on top of that that allows you to define an inference algorithm around that model.

papamarkou commented 10 years ago

I completely agree with your views @lindahua, @johnmyleswhite.

papamarkou commented 10 years ago

Oh, in order to complete the triplet, we would add the learning component (i.e. learning the model from data and prior experience) to model representation and inference :)

fredo-dedup commented 10 years ago

Hi @scidom, Separating the DSL from the current MCMC package makes sense, and further down the road having a family of model building packages (DSL, graphs, whatever) coupled to a family of inference methods packages (MCMC being just one among many) through a common model abstraction would make a very neat infrastructure. Of course, there is the problem of defining the model abstraction :

should the model be stateful ? (currently not in the case of the MCMC DSL, but probably necessary if partial updates of graphical models are to be supported)
needs a way to publish what the model can provide to the inference engine : gradient ? hessian ? etc..
is the unique kind of output currently available (a log likelihood) sufficient/meaningful for the family of engines envisioned ?
how to publish the inference method preference for each model parameter (say one is an integer, another is a float with a given size scale, etc.. )

In short, 1) this will take some time to mature and clearly needs someone to come forward to make and submit proposals, and 2) we'll have to make sure that along the way we are not trying to abstract the whole universe of modelling problems and build an unworkable model type !

More down to earth, as a short term plan, what if :

I suspend my work on the ReverseDiffSource rewrite
I create a package : that implements the MCMC DSL, itself using the current version of ReverseDiffSource
Update MCMC, thereby restored in its functional purity ?

PS: I do think that at some point there will be a tension between ensuring a simple user experience (MCMC is simple and useful and should be broadly accessible ! ) and internally having over broad abstractions and numerous modules implementing each a narrow set of functionnality.

lindahua commented 10 years ago

Since different inference methods have vastly different ways of doing things, and thus use very different things as states. I think the model specification itself should not maintain any state that is specific to a particular type of inference methods.

However, specific inference package (e.g. MCMC) can take the model specification as input, and create an instance that maintains the states necessary for the inference.

I would suggest just separating out the DSL and model specification at this point. It is unclear at this point what other aspects that different categories of methods can share. We can separate out other stuff when we see the need/benefit to do it.

papamarkou commented 10 years ago

Hi @fredo-dedup, @lindahua, I am thinking over your suggestions. I agree with your main points.

Here is my tentative plan and its justification:

Firstly, the ReverseDiffSource needs to become a separate package. Your code for reverse mode autodiff using source transformation is useful to a broad audience, not necessarily overlapping with MCMC or probabilistic models. Model parsing must be separated from the autodiff code to a certain extent. If some glue code is needed as a mediator between ReverseDiffSource and MCMC-related model parsing, this should be accommodated in the MCMC package. On a separate note, about autodiff, I would like to make two more minor observations. Later on, when we will consider to heavily optimize our autodiff code, we can steal a key idea from Stan's autodiff API. Their main innovation is to separate the autodiff logic from the algebraic calculations done on the basis of this logic. This means that the autodiff code should only generate a "template" on the first "pilot" iteration of MCMC and successive MCMC iterations will use this template for deriving numerically the required derivatives. A second point is that given the growing autodiff functionality in Julia (there are already 4 packages with varying approaches sometimes even within each package for carrying out AD), we should think how to create a flexible API inside MCMC for selecting which autodiff method to use. In any case, these two points are far-fetched for now. The top priority is to create the ReverseAutodiffSource package asap.
We need to create a new package called GraphicalModels, which will accommodate probabilistic (and possibly causal) models. This will be as abstract and generic as possible. I agree with @lindahua that the MCMC package will take this graphical model representation and will create a new model object specific to the needs of MCMC inference. This second development can wait for a while. I am happy to contribute here. Another point to make here, we are talking about the whole modelling universe, setting aside quantum-based models (i am not being humorous, there is already a book on how to build probabilistic models by employing quantum entanglement and other quantum notions).
I agree that our MCMC package should basically host the existing DSL, which will slightly adapt to the above changes.

papamarkou commented 10 years ago

So, things are steadily moving on. A start towards step 1 above has been made, since ReverseDiffSource is now a standalone package. This is exciting!

As for step 2, a start has also been made by creating in the PGM package in JuliaStats:

https://github.com/JuliaStats/PGM.jl

This is for now a placeholder. It has been added to metadata and there is an existing proposal outlining how to implement probabilistic graphical models. MCMC, as well as other packages in JuliaStats, will wait until PGM is first up and running so as to employ the graph approach for model specification.

LaurenceA commented 10 years ago

Can I push for @johnmyleswhite's idea of: MCMCBase: Simple MCMC tools that apply to objects in Base, like Vector.

In particulary, it would be super-awesome to have functions for RHMC and NUTS that just take vectors and functions. Such functions enable you to define your own inference scheme, for instance alternating between doing something complicated and exciting (say RHMC) on the continuous variables, and Gibbs, (or something complicated and non-parametric) on the discrete.

What is the advantage of Tasks? Would it be possible to wrap an MCMC_step function in a task? If intermediate storage is the reason it uses tasks now, we could pass all the intermediate vectors and matricies into the function as an optional argument.

I'm particularly interested in this at the moment because I'm working on another probabilistic programming language, which again mixes discrete and continuous variables --- but I think it is a pretty general problem.

papamarkou commented 10 years ago

Yes, I completely agree with you @LaurenceA, and with @johnmyleswhite. MCMCTasks will be a separate package (it is useful to have this for a number of reasons). There will be an MCMCAPI which will allow a unified interface for MCMC with which it will be possible among else to have a package which runs the MCMC samplers using vectors.

Leave this with me, I have already made a tentative plan and have started coding MCMCAPI (https://github.com/scidom/MCMCAPI.jl.git) before summer holidays (then a long break prevented me from completing it), but I plan to have this ready within one or at most two months. You will hear from me soon.

johnmyleswhite commented 10 years ago

I'll do some work on starting MCMCBase today. I'm not totally sold on the name: StatsBase is methods for doing Stats using types from Base, but a base of functionality extended by other statistical packages. I think there should be such an MCMCBase package, but the version that acts like Optim but for sampling should probably have another name like BlackBoxMCMC.

papamarkou commented 10 years ago

Not quite sure what is the overlap there; MCMCAPI is meant to provide the machinery for being able to then write up the Monte Carlo sampling routines i) using vectors, ii) using Tasks in combination with Julia's native parallel tools, iii) using MPI. In other words MCMCAPI is basically MCMCBase or at least precedes it in terms of functionality. Would you like to hold off until I finish MCMCAPI and then use it? In my opinion, it would be more reasonable this way so as to collaborate, but if you want us to work independently and separately, this is also fine with me, I will carry on anyhow. In the latter case, I will write my own version of MC sampling routines with vectors after I am done with MCMCAPI, we can have them both under JuliaStats.

papamarkou commented 10 years ago

I made some progress over the last week and I would like to share some thoughts and hear your views before I proceed any further. I have created MCMCAPI, MCMCStats, MCMCTasks and MCMCModels temporarily under my account. Here is a short summary of the main changes I have made so far:

A unified summary for MCMC development in Julia can be found in MCMCAPI. A typical MCMC simulation, say for ex for RMHMC, may require about 20 input arguments so as to fully specify it. MCMCAPI carefully organizes these input arguments in types to facilitate MCMC usage and development.
Along these lines, MCMCIteration has been introduced, which holds 2 MCMCSample's (the proposed and current sample) and is conceptually one iteration-element of the MCMCChain.
The MCMCChain has been tidied up and simplified. Among else, its task field has been removed in order to make MCMCChain more generic and allow its usage in a non-task progamming context. Besides, MCMCChain is meant to hold just the output and the diagnostics of an MCMC simulation, so its task field is redundant from that point of view.
The gradient of the log-target is now optionally saved in the MCMCChain, thus avoiding unecessary memory allocation.
The tuner logic and types are now standalone as they have been separated from the sampler types.
The type related to Metropolis-Hastings-based sampling has been generalized to allow the user choose the proposal function.
MCMCStats has been reverted to each prior simplified state, since some redundancy has been introduced from some recent change related to the ARS sampler. Instead, I followed an idea of @fredo-dedup and introduced usage of a select function in order to filter the MCMC output.

Now, after this cleaning up of code+abstractions+additional functionality, I came to the conclusion that further abstraction can easily be seen and achieved at this stage. In particular:

No matter whether we use plain loops or Julia tasks or MPI, the sampler and runner routines are the same. The minor changes can be accommodated via functional programming. For example, an MCMC iteration would either use a return or a produce statement depending on whether one wants to write a plain MCMC routine or to use Julia tasks. In fact, even better one can fully avoid some implicit involved conditional statements via source code generation. By that I mean that depending on how the user wants to run the MCMC simulation, the code for the intended MCMC routine can be generated in place, in a way similar to how JuMP operates.
I was thinking of either fitting this work in the set of {MCMCAPI, MCMCTasks, MCMCStats, MCMCModels, MCMCPlain (this is what @johnmyleswhite called MCMCBase), MCMCMPI and MCMCSuite (this last one reexports them all and it will be the existing MCMC package renamed)} standalone packages or in JuMC (Julia Monte Carlo, yes, this is not JuMP!).
This could go either in JuliaBayes as previously suggested by @lindahua or in JuliaStats.

All this work and line of development, according to my frame of mind, is the best way to accommodate all the different MCMC approaches (of course MCMCModels has not been touched really, that by itself is a massive project, but I leave this for - much - later and could be done with collaboration with @fredo-dedup ideally by relying on a future PGM package). At the same time, I am aware that there are more than one ways to do things, so my question to the community is where you would like me to go with this; do you want me to incorporate this work under a Julia organization and to carry on with joint development? If you think that at this stage there is another line of MCMC development that you favour, I am sincerely equally happy to host these package(s) under scidom instead and follow a more independent line of development. Let me know what you think is the best way to proceed in your opinion.

One last thing, I haven't ported all the existing code from MCMC, I only ported the HMC sampler and the SerialMCMC runner as a working case.

papamarkou commented 10 years ago

Just to keep everyone up-to-speed; I made further progress and found a way of unifying the interface for task and plain MCMC simulations (by plain I mean without using Julia coroutines). I made this operational using HMC as a pilot. What I need to do now is to port all the above changes plus this new unification into MCMC.jl.

After some thought, I don't want to migrate these new changes to sub-packages such as MCMCAPI, MCMCTasks etc. The reason for this is dual; i) I want to maintain the momentum and audience that the existing MCMC.jl package has created so far, ii) after having seen where my coding has brought me, it fits better to accommodate everything under a single package.

@johnmyleswhite, if you would like to create a separate MCMCBase package under JuliaStats, be my guest. The MCMC.jl package will be offering the functionality you were planning to include in MCMCBase (i.e. will be capable of running MCMC without relying on tasks if desired by the user), but there is no harm in having multiple packages achieving the same end product of course.

I will be upgrading MCMC.jl within the next 2 (or wost case senario 3) weeks, once I have completed all the coding.

johnmyleswhite commented 10 years ago

I need to catch up with this material, but I'd like to clarify what I had in mind.

I would like an API that allows me to provide the following inputs to a sampling algorithm:

A log-posterior function provided as f
An optional gradient of the log-posterior function provided as g!
An optional Hessian of the log-posterior function provided as H!
A Vector{T <: Real} of M initial parameter values
A number, N, of desired samples
A number, B, of burn-in steps A number,C`, of chains

And will allow me to get back a set of N samples, stored in a Array{T, 3} with M rows, N columns and C repetitions.

Whether this function uses threads, tasks or other construct behind the scenes isn't really important to me. What matters is that I have an optimization-like interface for extracting samples from a model, so that I can first find a MAP estimate using optimize from Optim.jl, then extract samples using this interface.

Is this what you're working on @scidom?

papamarkou commented 10 years ago

What I was working on was somewhat more challenging technically - I found a way of factoring out the common code in MCMC samplers that uses simple loops or tasks to perform the sampler iterations.

You are right, this is a matter that should not be of concern to the user as long as the user interface and the MCMC implementation are decoupled. The thing is that the current version heavily relies on tasks (as input to most of the functions in the package), which means that the decoupling I was talking about is questionable.

To answer your question, this is what I worked on indirectly; after the changes I have made, the interface is task-agnostic no matter whether tasks are used in the background or not, therefore it is now straightforward to write a 2-line function wrapper that provides the interface you suggested. I will include this in the MCMC PR I will open very soon.

papamarkou commented 10 years ago

@johnmyleswhite I completed the core of the changes I had in mind using the Hamiltonian Monte Carlo (HMC) sampler as a pilot case. These changes are in my fork of MCMC.jl. I also wrote an alternative basic interface based on your query above. Before propagating the interface (function wrappers) across all the samplers and submitting a PR within the next few days, I want to have your feedback specifically with regards to the interface you would like to use in Optim.jl.

Here is the function signature of the wrapper for HMC:

HMC(f::Function, g::Function, init::Vector{Float64}, nsteps::Int, burnin::Int;
  nchains::Int=1, nleaps::Int=10, leapstep::Float64=0.1, thinning::Int=1)

As you can see, the required arguments are f (log-posterior), g (gradient of log-posterior, this is needed for HMC, not for Metropolis-Hastings for ex), init (a vector of initial values), nsteps (number of MCMC iterations), burnin (number of burnin iterations). The first optional argument, nchains, specifies the number of chains that will be run, and the rest of the optional arguments adjust other facets of the sampling process. Is this interface to your satisfaction or you want to suggest any improvements?

papamarkou commented 10 years ago

Sigh, this long-standing issue can now be closed. The latest major upgrade of MCMC has addressed the issue of providing a simplifed interface without forcing the user to deal with the underlying model, sampler and runner types. For now, you can get an idea of the simpler interface in src/ui.jl. Coherent documentation will follow in the next few days.

skanskan commented 6 years ago

What is more powerful now, Stan, DynamicHMC.jl, Lora.jl, Mamba.jl, Klara.jl...? Faster and converges with difficult problems. What I don't like about Stan is it isn't easy to deal with discrete parameters, and you need an external program (and language).

JuliaStats / Klara.jl

Making MCMC.jl more flexible and portable in terms of model specification #35