JuliaStats / Klara.jl

MCMC inference in Julia
Other
167 stars 38 forks source link

Interest in a Julia <-> Stan interface #45

Closed dmbates closed 9 years ago

dmbates commented 10 years ago

Andrew Gelman contacted me about the possibility of a Julia interface to Stan, similar to the rstan package for R. I'll just quote our exchange here. Let me know if it would be more appropriate to move this to the milestones issue under StatsBase or to one of the discussion groups.

On Mon, Apr 28, 2014 at 6:01 AM, Andrew Gelman gelman@stat.columbia.edu wrote: Hi Doug. I hope all is well with you. Bob Carpenter was telling me that you’ve been active in the Stan list recently. I’ve also heard you’ve been working with Julia. We have some interest in putting together a Julia/Stan link, at the very least having a way to run Stan from Julia. One reason for this is that soon Stan should have the capability of quickly fitting a lot of models that we now do in R (lm, glm, lmer, etc) using point estimation of various sorts as well as full HMC. Do you have any suggestions/thoughts about how a Julia link could be built? I don’t think any of us in stan-core use Julia, but we’ve been having a lot of contributions from outsiders (for example, Allen Riddell who wrote PyStan and now is part of stan-core). Any advice would be appreciated. Thanks.

-- My reply

There is definitely interest in the Julia community in Stan. To get an idea of the current statistical capabilities in Julia take a look at the JuliaStats group on github.com. A more complete list of packages is at

http://julia.readthedocs.org/en/latest/packages/packagelist/

An interface between Julia and Stan is, in principle, no more complicated than rstan and could, in fact, be somewhat easier than rstan. A package implementing such an interface would typically be called something like JuliaStan (camel-case names for packages are preferred) and stored in a repository named JuliaStan.jl. There are sophisticated, although somewhat complicated, ways of building binary dependencies on various operating systems. I am currently struggling with implementing such dependencies for the Metis graph-partitioning package, now mostly complete.

Julia has very flexible facilities for calling C functions, see the section on "calling C and Fortran code" in http://docs.julialang.org/en/latest/ (note that the latest Julia release, 0.2, is badly out-of-date now relative to the 0.3-prerelease version and developers mostly work in 0.3). At present it is not easy to call C++ functions or methods directly because of name mangling but anything declared within

extern "C" { ... }

is okay.

The package structure is reasonably easy to pick up for anyone familiar with github.

I can imagine several ways of proceeding.

It really depends on whether you want to emphasize the Julia aspects or the Stan aspects of the project.

If you wish, I could block out a JuliaStan.jl repository based on rstan under my github login and then we could determine where to transfer it.

May I forward this message or a summary of this message to the JuliaStats members to indicate that the Stan community is interested in Julia?

There is an MCMC.jl project that seeks to implement HMC and NUTS samplers in Julia and there is active development of forward and backward automatic differentiation. I don't know the developers working on that project well and cannot vouch for their capabilities.

papamarkou commented 10 years ago

It would be great to combine Julia and Stan. Of course I rely on Julia to carry out the computing aspects of my research, yet this is at present primarily related to the theory of rough paths (for which I will be developing a Julia package over the next months) and I hope I will manage to find time to contribute to PGM.jl if I find some time soon, so I have very limited development time to support a Julia-Stan interface. If there is man power to develop JuliaStan, I would certainly find it useful as a user.

StefanKarpinski commented 10 years ago

Juliastan sounds like a central Asian country. I think the name for that package would be Stan.jl.

dmbates commented 10 years ago

Rats, I was hoping to sneak the name Juliastan in just because it sounds like a central Asian country :-)

papamarkou commented 10 years ago

Haha... it should be Yulkistan if it were to sound truly eastern ;)

goedman commented 10 years ago

Hi! A very interesting issue thread and I think related to earlier issues #20, #34 and #35.

From the Stan side, having helped the Stan team last year with a C++11 issue (mainly in the Stan parser/compiler area), watching the struggles with differences between C++ compilers on different platforms (the Stan team is great in trying to help!) and going over Bob Carpenter's suggested user function proposal (very useful but could be difficult to maintain over time - which parts of C++/C to accept?) are several of the arguments that started me thinking a few weeks ago about what options there are to get Stan-like functionality in Julia.

From the Julia side, on the one hand there is the issue of Julia not playing nice with C++ (which is huge in this case, Stan uses Spirit and other Boost components and templated libraries very extensively). On the other hand, Julia provides already a great starting point (MCMC.jl, AD packages, Distributions and other JuliaStat components).

This morning, before reading this thread, I had gotten to the point where I started to think about

  1. Turning the Stan modeling language into a DSL (likely based to some extent on macros) on top of an extended version of MCMC.jl and
  2. A second DSL that accepts Bugs/Jags models
  3. What it would take for MCMC.jl to accept and handle multiple target models/samplers
  4. A repository for MCMC models ( e.g. MCMCExampleRepository ).

Of course a much simpler solution, with clear limitations, is to initially use CmdStan (hey, for a number of years I have been quite happily using Jags and Finite Element stand alone programs from R :-), using Julia to formulate the input and read back in the generated output files. Another shortcut could be to use PyStan as a starting point.

I do have some cycles, so I am quite interested in where this discussion goes.

bob-carpenter commented 10 years ago

Wrapping CmdStan should be pretty easy and get people most of what they'd want from a user perspective. We just wrote a JSON reader for model data and inits --- we should be supporting it from the new CmdStan soon, and it may make sense to target that rather than the current, rather quirky, R format.

The main disadvantage to just wrapping CmdStan is that you can't easily access the log probability function and gradients. These are super-useful tools if you just want Stan's modeling language and conversion to unconstrained parameters. A minor disadvantage is that you have to do a write-to-file and read both ways from Julia (unless data's already in a Stan-readable file format). That's usually not the bottleneck in any kind of interesting model.

As to providing yet another language, I think it'd be easier to just have users write in the Stan modeling language itself. All of our ongoing doc will be written that way and I don't think it'll be easy for non-Julia users to understand a Julia-specific DSL format. We have the same design issues with Python. Of course, just wrapping CmdStan doesn't have this issue.

StefanKarpinski commented 10 years ago

Much of the benefit of integration between Julia and Stan, as I understand it, would come from being able to use functions written in Julia as callbacks in Stan, which wouldn't work with CmdStan, so I'm not sure if that's really the best route, although it certainly does seem easy. Is there no C API to Stan?

goedman commented 10 years ago

@bob-carpenter - Just to be sure, you mean creating/reusing files like bernoulli.data.R and bernoulli.stan (or their json equivalents) and reading back in files like output.csv and possibly a file containing the log probability and gradient values after stan is done? While stan is running one would have to use the new function feature in Stan's Modeling Language?
By the way, I did not express myself very clearly when talking about DSLs. I meant parsers for Stan's ML, Bugs/Jags ML and possibly formulas in R and MixedModels.jl (along the lines of rstanarm). PGM.jl does contain a proposal for a newer, broader modeling and query language, but that was not what I had in mind at this point in time.
@StefanKarpinski - Nope, no C API. The callback ability is indeed a major advantage, but what I also like quite a bit in MCMC.jl is being able the write:
mcchain = run(model * [HMC(1, 0.025), NUTS(), STAN()] * SerialMC(1000:10000))
But I do realize the MCMC.jl models/samplers still need quite a bit of work to get to that point and in this example maybe comparing two NUTS approaches is not that useful.
Let's see if there is any other feedback.

bob-carpenter commented 10 years ago

On May 2, 2014, at 11:18 PM, Rob J Goedman notifications@github.com wrote:

@bob-carpenter - Just to be sure, you mean creating/reusing files like bernoulli.data.R and bernoulli.stan (or their json equivalents) and reading back in files like output.csv and possibly a file containing the log probability and gradient values after stan is done? While stan is running one would have to use the new function feature in Stan's Modeling Language?

Yes, reusing files for data, inits, model, etc., and reading back in inputs.

That wouldn't let you interface Julia functions at all.

In order to interface with a function in Julia, we'd need to be able to compute its gradient to do basic HMC and second derivatives in order to do some of the fancier approximate (EP, VB) and point estimates (MML) and MCMC (RHMC) methods we're looking at.

By the way, I did not express myself very clearly when talking about DSLs. I meant parsers for Stan's ML, Bugs/Jags ML and possibly formulas in R and MixedModels.jl (along the lines of rstanarm).

Why would you want to parse those things on the Julia side? There's the C++ parser in Stan, which builds an abstract syntax tree and checks well-formedness at the syntactic and semantic levels.

PGM.jl does contain a proposal for a newer, broader modeling and query language, but that was not what I had in mind at this point in time.

I'm curious about the PGM.jl language spec --- we're always looking for ways to improve Stan. Is the proposal somewhere public?

We've also been thinking about something like the R language for specifying generalized linear models. We'd like to be able to add priors in an easy way, but don't really see how to do that.

goedman commented 10 years ago

Hi Bob,

From Stan's point of view you are right (of course).

Given the availability of RStan & PyStan, I wondered what the merits are for Julia users to have Stan as a stand-alone program (versus as a library or a pure Julia version). Reading the results file back into Julia is hardly a problem.

A few more thoughts/answers inserted below.

On May 2, 2014, at 3:01 PM, Bob Carpenter notifications@github.com wrote:

On May 2, 2014, at 11:18 PM, Rob J Goedman notifications@github.com wrote:

@bob-carpenter - Just to be sure, you mean creating/reusing files like bernoulli.data.R and bernoulli.stan (or their json equivalents) and reading back in files like output.csv and possibly a file containing the log probability and gradient values after stan is done? While stan is running one would have to use the new function feature in Stan's Modeling Language?

Yes, reusing files for data, inits, model, etc., and reading back in inputs.

That wouldn't let you interface Julia functions at all.

Agreed.

In order to interface with a function in Julia, we'd need to be able to compute its gradient to do basic HMC and second derivatives in order to do some of the fancier approximate (EP, VB) and point estimates (MML) and MCMC (RHMC) methods we're looking at.

I think this is the crux of the matter. The surface area of such a callback API could get rather convoluted and difficult to maintain/evolve.

The Stan team has taken the route of very significantly enhancing several components of Boost, extending Eigen++, developing Agrad for AD, bfgs, etc.

Some of these extensions are part of Julia and its packages. I need to better understand the coverage/overlap in these areas. That takes me back to my initial response.

By the way, I did not express myself very clearly when talking about DSLs. I meant parsers for Stan's ML, Bugs/Jags ML and possibly formulas in R and MixedModels.jl (along the lines of rstanarm).

Why would you want to parse those things on the Julia side?

Certainly needed if a pure Julia version needs a different input format, but could also be useful to report results, meta programming, etc.

There's the C++ parser in Stan, which builds an abstract syntax tree and checks well-formedness at the syntactic and semantic levels.

PGM.jl does contain a proposal for a newer, broader modeling and query language, but that was not what I had in mind at this point in time.

I'm curious about the PGM.jl language spec --- we're always looking for ways to improve Stan. Is the proposal somewhere public?

The spec is at https://github.com/JuliaStats/PGM.jl

We've also been thinking about something like the R language for specifying generalized linear models. We'd like to be able to add priors in an easy way, but don't really see how to do that.

This sounds like item #20 in the 'Closed' MCMC.jl issue list. I don't think that was fully answered there as well.

  • Bob — Reply to this email directly or view it on GitHub.

Regards, Rob J. Goedman goedman@icloud.com

bob-carpenter commented 10 years ago

On May 3, 2014, at 8:45 PM, Rob J Goedman notifications@github.com wrote:

Hi Bob,

From Stan's point of view you are right (of course).

Given the availability of RStan & PyStan, I wondered what the merits are for Julia users to have Stan as a stand-alone program (versus as a library or a pure Julia version). Reading the results file back into Julia is hardly a problem.

A few more thoughts/answers inserted below.

On May 2, 2014, at 3:01 PM, Bob Carpenter notifications@github.com wrote:

On May 2, 2014, at 11:18 PM, Rob J Goedman notifications@github.com wrote:

@bob-carpenter - Just to be sure, you mean creating/reusing files like bernoulli.data.R and bernoulli.stan (or their json equivalents) and reading back in files like output.csv and possibly a file containing the log probability and gradient values after stan is done? While stan is running one would have to use the new function feature in Stan's Modeling Language?

Yes, reusing files for data, inits, model, etc., and reading back in inputs.

That wouldn't let you interface Julia functions at all.

Agreed.

In order to interface with a function in Julia, we'd need to be able to compute its gradient to do basic HMC and second derivatives in order to do some of the fancier approximate (EP, VB) and point estimates (MML) and MCMC (RHMC) methods we're looking at.

I think this is the crux of the matter. The surface area of such a callback API could get rather convoluted and difficult to maintain/evolve.

Right --- it's not that bad if you think you just need function and partial derivatives, but it involves all the data structures going back and forth.

It's easier to just write new functions in templated C++ or directly in Stan.

The Stan team has taken the route of very significantly enhancing several components of Boost, extending Eigen++, developing Agrad for AD, bfgs, etc.

Some of these extensions are part of Julia and its packages. I need to better understand the coverage/overlap in these areas. That takes me back to my initial response.

We keep meaning to pull many of these things out into standalone tools. Definitely auto-diff --- I think the basic interface is pretty settled there, at least at the functional level if not at the lower-level data structure level.

By the way, I did not express myself very clearly when talking about DSLs. I meant parsers for Stan's ML, Bugs/Jags ML and possibly formulas in R and MixedModels.jl (along the lines of rstanarm).

Why would you want to parse those things on the Julia side?

Certainly needed if a pure Julia version needs a different input format, but could also be useful to report results, meta programming, etc.

Meta programming for sure. Implementing a pure Julia version would definitely be ambitious.

...

I'm curious about the PGM.jl language spec --- we're always looking for ways to improve Stan. Is the proposal somewhere public?

The spec is at https://github.com/JuliaStats/PGM.jl

Thanks.

goedman commented 10 years ago

After some further discussions with Bob and other folks on the Stan team, I propose to move forward

  1. by introducing a very lightweight Stan.jl (encapsulating the C++ Stan command line version) and
  2. by adding multilevel models to a forked version of MCMC.jl (item 14 in the MCMC.jl/docs/TODO.txt).

Based on requests from users I'll add more functionality to Stan.jl. Over time, either through C APIs and/or structured output, more posterior processing results can be made available, again, likely dependent on the interest level of Julia users in Stan.jl. Modeling will be done through the Stan Modeling Language.

To get a better grasp of the structure of MCMC.jl, I'll first have a go at items 12 (ARS sampler) and possibly 13 (SIR sampler) on the TODO list.

StefanKarpinski commented 10 years ago

Question about Stan: is Stan intended to be used from languages other than C++? If so, shouldn't it have a pure C API – since C++ has no ABI, rendering it uncallable from other languages, except via extern C wrappers?

bob-carpenter commented 10 years ago

Stan was not originally designed to be called from languages other than C++ because all of our original targets, R, Python, and MATLAB support C++ interfaces one way or another. These interface mechanisms all encapsulate some kind of wrapper. So that's another way to consider going for Julia --- build something like Rcpp in R in order to make it easier to wrap C++.

What's an "ABI"?

Also, does anyone have a recommended reference for creating C wrappers out of C++ code? I don't even know the basics.

On May 10, 2014, at 11:47 PM, Stefan Karpinski notifications@github.com wrote:

Question about Stan: is Stan intended to be used from languages other than C++? If so, shouldn't it have a pure C API – since C++ has no ABI, rendering it uncallable from other languages, except via extern C wrappers?

— Reply to this email directly or view it on GitHub.

goedman commented 10 years ago

Given Stefan's question and Bob's answer, it might be useful to expand a bit more what I'm thinking about for Stan.jl. Nothing new, more of a summary, a bit long, apologies.

For many users Stan is primarily a stand alone tool to compute the multilevel generalized models that other Bayesian tools could not handle or would take too long to converge. Out of the box Stan handles maybe 90% of what it is intended for. New features being added (mostly by the Stan team) to the Stan Command Language, new distributions, new samplers and new/improved tuners (in terms of MCMC.jl) and experience with how to best use Stan will only increase that percentage.

The command line is not an appealing environment to run the tool or to display and further process/analyze the results. The target environments Bob mentions, R and Python (I have not seen Matlab a lot on the mailing lists), made the workflow around Stan a lot smoother. With a thin layer of Julia code to easily use Stan from within Julia and import the results (a simple readtable() call) Julia gets similar benefits.

For a few features of interest, R and Python do have a way (Rcpp & Cython) to look 'inside' Stan. Bob mentioned earlier two important features why one might want to look inside Stan:

  1. access to the log probability function as constructed by Stan during its compilation step (Stan Modeling Language into C++ code) and
  2. the mapping from constraint parameter space to unconstraint space (where the sampling occurs, in the final executable based on the generated C++ code in the first step).

An example of such a transformation is the binomial rate in the range 0 to 1 being transformed via a logit() function to the unconstrained space. For other models and gradients it can get more complicated.

These features would not come standard in Stan.jl, but could be added in Julia for specific cases. How to do that is explained in the Stan documentation and stan-user mail alias. Similarly, I don't think we should reproduce other functionality available in RStan and Pystan, e.g. plotting the results, etc. At least not initially. But I would like to add a convert(MCMCChain, "stan-result") function.

This way, I see the Stan.jl package both as a tool on it's own (as a template to use Stan in Julia) and as complementary to MCMC.jl. MCMC.jl (and Julia) is a better environment then C++ to experiment with new algorithms, investigate the performance of algorithms, etc. Stan today is much more mature as a tool for multilevel modeling using HMC/NUTS sampling. More mature in the sense of input error checking, range of models covered, user experience sharing, documentation, etc.

@bob-carpenter - API works on (Program) source level, ABI works on (Binary) compiled library level. I did include a Stack Overflow link to C wrapper functionality for C++ in my response to Allen.

dmbates commented 10 years ago

I think there is a certain amount of talking past each other going on here. For example, Stefan mentioned that C++ doesn't have an ABI which I take to mean that you can't determine from the signature of a C++ function or method exactly how it is going to be called in compiled code. This is not a issue that would be important in R, at least, because you couldn't make such a call from R.

If I am writing an interface from R to C code I take objects from R (pointers to SEXPREC structs) and marshal the contents of those objects to be used in a call to the C functions. If I want to return values from the C code I need to provide a way of mapping those values into R structures. Then I must create the corresponding SEXPREC structure, copy the values into this structure and then return it.

Rcpp provides C++ classes that mirror some of the types of objects in R. For example, it defines a NumericVector class which can be instantiated from an SEXP (pointer to an SEXPREC). In some sense it is a set of C++ templated classes that unpack the contents of SEXP's and that create and populate SEXPs. It all boils down to the templated as and with functions in Rcpp for unpacking and packing respectively.

With R you must write compiled code to unpack and pack the R structures. R only knows how to handle SEXPREC structures. It is the passive partner in the exchange. It doesn't come to you, you must come to it.

When interfacing from Julia it is more common to have Julia call the functions from the compiled library directly. The packing and unpacking of structures is done at the Julia level, not in the compiled code, for the most part.

This is why a person experienced with Julia wants to know exactly what the call in the binary code looks like. It is an entirely different approach to interfacing to R. With R the onus is on the writer of the compiled code to receive and return R's data structures. With Julia you usually expect to create the data structures in Julia and call the compiled code directly and this is where things like name mangling get awkward.

Certainly it is possible to write intermediate code with functions defined in

extern "C" {
 ...
}

blocks and call those from Julia. That's okay but to the Julia programmer it is tedious because the interfacing is taking place in the compiled language.

alyst commented 10 years ago

There is SWIG that can generate C and C++ wrappers for many languages and reduce some tedious coding. Julia is not in the list of supported languages yet. I wonder whether autogenerated wrappers for Julia could look more straightforward than the ones SWIG currently generates for R.

goedman commented 10 years ago

Just a quick update. I made some progress with Stan.jl and also with adding an ARS sampler to MCMC.jl.

Stan.jl is not published but can be cloned from https://github.com/goedman/Stan.jl. Last Sunday's addition of logtargets to the MCChain definition will help my next step, converting the Stan output to an MCChain type.

The ARS sampler is a bit trickier. For now I have chosen to add a boolean to the MCChain type, thus allowing the stats functions to decide whether to take rejected samples along. Any feedback of course welcome. The code is at https://github.com/goedman/MCMC.jl. In any case, adding the ARS sampler was a good intro to the structure of MCMC.jl.

fredo-dedup commented 10 years ago

Hi Rob, Thanks for contributing to MCMC.jl, I look forward to including to your additions !

Just a quick question about ARS (an architecture question, the code itself looks perfectly fine) : is the useAllSamples boolean only used in the stats functions ? If that is the case I would suggest changing only the stats functions by checking for the sampler used ( isa(sampler, ARS) on the sampler contained in the MCMCChain type). That would limit the changes necessary. If it is a feature of a family of samplers instead of only one, this could be dealt with a parent Abstract type (ARS <: MCMCAcceptOnlySampler <: MCMCSampler).

goedman commented 10 years ago

Hi Frédéric,

That is a great suggestion. You are right, useAllSamples is only used in the stats functions once the ARS sampler sets it to false. By default it is true. With your suggestion that can all go. I'll update the forked version accordingly.

At some point I thought I would need it for the SIR sampler as well, but the resampling part of SIR does not fit neatly in the current MCMC.jl structure (I think). Hence, after ARS I will probably focus more on Gibbs sampling, which is a bigger job and more important for what I am after (comparisons between GLM/MixedModels/Stan/Jags/MCMC). And the abstract parent type would anyway be a nicer solution.

Rob J. Goedman goedman@icloud.com

On May 20, 2014, at 2:07 PM, Frédéric Testard notifications@github.com wrote:

Hi Rob, Thanks for contributing to MCMC.jl, I look forward to including to your additions !

Just a quick question about ARS (an architecture question, the code itself looks perfectly fine) : is the useAllSamples boolean only used in the stats functions ? If that is the case I would suggest changing only the stats functions by checking for the sampler used ( isa(sampler, ARS) on the sampler contained in the MCMCChain type). That would limit the changes necessary. If it is a feature of a family of samplers instead of only one, this could be dealt with a parent Abstract type (ARS <: MCMCAcceptOnlySampler <: MCMCSampler).

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

I don't know anythinng about Julia's architecture, but if this is for thinning the output of a sampler, then I'd think it should be general. The output of thinning a Markov chain is still a Markov chain, and in the case of MCMC, the thinned chain still has the right expectations for derived statistics like means, variances, quantiles, and related plug-in estimates for events (like multiple comparisons).

You can either set a sampler to thin its own outputs, which can be more efficient, as in the case of Stan, or you can thin the unthinned outputs after the fact. You could even have a filter object that thins Markov chains. In Stan, we've just pushed all this down into the samplers themselves for efficiency because we don't need to thin on the outside anywhere.

On May 20, 2014, at 11:07 PM, Frédéric Testard notifications@github.com wrote:

Hi Rob, Thanks for contributing to MCMC.jl, I look forward to including to your additions !

Just a quick question about ARS (an architecture question, the code itself looks perfectly fine) : is the useAllSamples boolean only used in the stats functions ? If that is the case I would suggest changing only the stats functions by checking for the sampler used ( isa(sampler, ARS) on the sampler contained in the MCMCChain type). That would limit the changes necessary. If it is a feature of a family of samplers instead of only one, this could be dealt with a parent Abstract type (ARS <: MCMCAcceptOnlySampler <: MCMCSampler).

— Reply to this email directly or view it on GitHub.

fredo-dedup commented 10 years ago

Good point : isn't ARS thinning (consider only values that have been accepted) similar to the usual thinning (keep a sample every n samples) ? If we follow that idea we could have the ARS Sampler send only the accepted samples. Would that make sense Rob ?

@bob-carpenter : in our current architecture we split what you call the sampler into a runner x sampler : the sampler just provides the next draw (it is a coroutine) and the runner decides what to do with it (keep all samples or apply a thinning + throw out the burnin). The distinction was useful to implement more cleanly serial tempering and sequential tempering alongside the classic serial sampling without having to rewrite the different samplers.

goedman commented 10 years ago

Hi Bob,

Thanks for your input.

The ARS sampler is a suggested addition to MCMC.jl and samplers in MCMC.jl report back accepted and rejected samples. A rejected sample basically means the chain stays in its current state and for HMC, NUTS and all existing MCMC.jl samplers all of these are taken along as valid posterior samples. That is not the case for ARS and SIR and thus I needed a way to discard those when computing the mean, etc. using the functions that come with MCMC.jl. I think this is orthogonal to thinning (which is also required for ARS and SIR).

All Stan related stuff will go into Stan.jl. My next step for Stan.jl is converting Stan's output to an MCMCChain structure (Julia type), kind of a bridge between Stan.jl and MCMC.jl to make comparisons easier. But I must say, after working on it for only a couple of days, Julia's REPL and the IJulia Notebook interface are both appealing environments to run CmdStan!

Rob J. Goedman goedman@icloud.com

On May 20, 2014, at 2:37 PM, Bob Carpenter notifications@github.com wrote:

I don't know anythinng about Julia's architecture, but if this is for thinning the output of a sampler, then I'd think it should be general. The output of thinning a Markov chain is still a Markov chain, and in the case of MCMC, the thinned chain still has the right expectations for derived statistics like means, variances, quantiles, and related plug-in estimates for events (like multiple comparisons).

You can either set a sampler to thin its own outputs, which can be more efficient, as in the case of Stan, or you can thin the unthinned outputs after the fact. You could even have a filter object that thins Markov chains. In Stan, we've just pushed all this down into the samplers themselves for efficiency because we don't need to thin on the outside anywhere.

  • Bob

On May 20, 2014, at 11:07 PM, Frédéric Testard notifications@github.com wrote:

Hi Rob, Thanks for contributing to MCMC.jl, I look forward to including to your additions !

Just a quick question about ARS (an architecture question, the code itself looks perfectly fine) : is the useAllSamples boolean only used in the stats functions ? If that is the case I would suggest changing only the stats functions by checking for the sampler used ( isa(sampler, ARS) on the sampler contained in the MCMCChain type). That would limit the changes necessary. If it is a feature of a family of samplers instead of only one, this could be dealt with a parent Abstract type (ARS <: MCMCAcceptOnlySampler <: MCMCSampler).

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

goedman commented 10 years ago

I started out considering that possibility, but rejected it because it would take away the possibility calculate the acceptance rate (I think).

Rob J. Goedman goedman@icloud.com

On May 20, 2014, at 3:13 PM, Frédéric Testard notifications@github.com wrote:

Good point : isn't ARS thinning (consider only values that have been accepted) similar to the usual thinning (keep a sample every n samples) ? If we follow that idea we could have the ARS Sampler send only the accepted samples. Would that make sense Rob ?

@bob-carpenter : in our current architecture we split what you call the sampler into a runner x sampler : the sampler just provides the next draw (it is a coroutine) and the runner decides what to do with it (keep all samples or apply a thinning + throw out the burnin). The distinction was useful to implement more cleanly serial tempering and sequential tempering alongside the classic serial sampling without having to rewrite the different samplers.

— Reply to this email directly or view it on GitHub.

fredo-dedup commented 10 years ago

Yes, and thinning (in the usual sense) would apply to already filtered values... bad idea.

bob-carpenter commented 10 years ago

I'm not an expert on ARS, but I can see two cases:

  1. It's algorithm internal, in which case if you include them in the chain you'll get incorrect estimates.
  2. It's a standard Metropolis rejection, in which case you have to include the "rejected" draws for correct estimates.

Now you can remove every other estimate, or 2 out of three estimates, or any other regular schedule like that and still have a Markov chain. But you have to count the Metropolis rejections as part of the thinning.

The architecture sounds good --- we did roughly the same thing, though it's an iterator in C++, so we don't have the pleasure of literally co-routining (although if you stand on your head and look at the state of the iterator as the context for a continuation, you can make the concepts line up pretty neatly).

As to tempering, check out Michael Betancourt's latest:

http://andrewgelman.com/2014/05/20/thermodynamic-monte-carlo/

Coming soon to Stan :-)

On May 21, 2014, at 12:13 AM, Frédéric Testard notifications@github.com wrote:

Good point : isn't ARS thinning (consider only values that have been accepted) similar to the usual thinning (keep a sample every n samples) ? If we follow that idea we could have the ARS Sampler send only the accepted samples. Would that make sense Rob ?

@bob-carpenter : in our current architecture we split what you call the sampler into a runner x sampler : the sampler just provides the next draw (it is a coroutine) and the runner decides what to do with it (keep all samples or apply a thinning + throw out the burnin). The distinction was useful to implement more cleanly serial tempering and sequential tempering alongside the classic serial sampling without having to rewrite the different samplers.

— Reply to this email directly or view it on GitHub.

papamarkou commented 10 years ago

I may be missing something as I only got into the conversation now; why would we need to add a new field since there is already infrastructure supporting the proposed functionality required for ARS? In particular, the conventional approach so far has been to add an accept key in the diagnostics dictionary field of MCMCSample which points to a boolean (see for example the samplers.jl and HMC.jl). Wouldn't be the obvious and natural extension to employ the same approach for ARS? It may be the case of course that I have missed some point in the discussion.

papamarkou commented 10 years ago

As for Gibbs sampling, any addition would be great. We only need to discuss a bit in advance how its associated types would fit in the existing scheme. For example, HMC and RMHMC are Gibbs samplers, which would probably dictate some restructuring of the existing code as well.

bob-carpenter commented 10 years ago

What you may want to think about for general design is blocking. Often samplers are constructed by composition from samplers for subsets of variables (usually conditioned on other variables). For instance, BUGS and JAGS choose an appropriate sampler for each parameter node in the graphical model.

HMC/RHMC are instances of Metropolis when viewed the right way, as is Gibbs, but HMC/RHMC are not instances of Gibbs samplers.

The terminology in the literature is confusing. Sometimes "Gibbs sampler" is restricted to where the conditional sampling can be done analytically, as in conjugate models. Sometimes "Gibbs sampling" is construed more generally, allowing other samplers for the conditional distributions. For example, BUGS uses ARS for non-conjugate continuous nodes in the graphical model, whereas JAGS uses slice sampling.

I'd think you'd want to do slice sampling before (or instead of) ARS, but maybe I'm missing some motivation for ARS.

On May 21, 2014, at 2:07 PM, Theodore Papamarkou notifications@github.com wrote:

As for Gibbs sampling, any addition would be great. We only need to discuss a bit in advance how its associated types would fit in the existing scheme. For example, HMC and RMHMC are Gibbs samplers, which would probably dictate some restructuring of the existing code as well.

— Reply to this email directly or view it on GitHub.

papamarkou commented 10 years ago

I agree that the terminology can be confusing. I am not sure why you claim that RMHMC and HMC can not be viewed as Gibbs samplers - have a look for example at equations (19) and (20) in the RSS B paper by Girolami and Calderhead (2012) where they present these samplers as Gibbs ones.

papamarkou commented 10 years ago

The slice sampler has been implemented already by a contributor and has been embodied to the package, open to suggestions on how to improve it.

bob-carpenter commented 10 years ago

Good point. And a great illustration of the confusion this taxonomy can cause when samplers are really compounds.

You're right that you can view the alternation between (R)HMC's sampling of momentum and position as Gibbs (in the non-conjugate sense) with the momentum being an exact update and the position update being Metropolis-like (Hamiltonian dynamics proposes position and then there's an accept/reject step).

You could also view HMC as a Metropolis sampler where you're jointly proposing momentum and position updates. Because the momentum updates are exact (MC, not MCMC) and independent of the position updates (they're sampled analytically as independent unit normals), so they cancel out of the rejection stat.

In fact, you can view Gibbs itself as an exact Metropolis with a 100% accept rate. All very confusing from the app design point of view, but helpful when you need to prove properties about samplers.

On May 21, 2014, at 5:42 PM, Theodore Papamarkou notifications@github.com wrote:

I agree that the terminology can be confusing. I am not sure why you claim that RMHMC and HMC can not be viewed as Gibbs samplers - have a look for example at equations (19) and (20) in the RSS B paper by Girolami and Calderhead (2012) where they present these samplers as Gibbs ones.

— Reply to this email directly or view it on GitHub.

goedman commented 10 years ago

Hi Theodore,

The ARS sampler is using that same mechanism.

While in Metropolis rejected samples are still valid posterior samples, in ARS that is not the case so I needed a way to communicate that to the post-sampling functions in stats or drop rejected samples altogether (in the runner). As I wanted to be able to compute the acceptance rate, I opted for the first route.

Frédéric came up with a much cleaner solution, which does not need a new field in the MCMCChain, just comparable updates to the stats functions. I'm trying that over the next 2 days.

I agree with your remark on the Gibbs sampling in your second email. I had come to a similar conclusion as Bob suggested, to take a hard look at blocking MH. ARS to me was just a way to get familiar with the structure of MCMC.jl.

Bob, thanks for jumping in with your experience with, amongst other things, developing and using Stan and the pointer to Michael's paper, very interesting!

Theodore & Frédéric, I know very little about the seqMC part in MCMC.jl. Guess I need to buy the Stephens/Holmes publication?

Rob J. Goedman goedman@icloud.com

On May 21, 2014, at 5:05 AM, Theodore Papamarkou notifications@github.com wrote:

I may be missing something as I only got into the conversation now; why would we need to add a new field since there is already infrastructure supporting the proposed functionality required for ARS? In particular, the conventional approach so far has been to add an accept key in the diagnostics dictionary field of MCMCSample which points to a boolean (see for example the samplers.jl and HMC.jl). Wouldn't be the obvious and natural extension to employ the same approach for ARS? It may be the case of course that I have missed some point in the discussion.

— Reply to this email directly or view it on GitHub.

On May 21, 2014, at 5:07 AM, Theodore Papamarkou notifications@github.com wrote:

As for Gibbs sampling, any addition would be great. We only need to discuss a bit in advance how its associated types would fit in the existing scheme. For example, HMC and RMHMC are Gibbs samplers, which would probably dictate some restructuring of the existing code as well.

fredo-dedup commented 10 years ago

You can just google it. And there are probably many other papers giving an introduction to these algorithms that exist....

goedman commented 10 years ago

@scidom @fredo-dedup I have updated the ARS sampler as suggested by Frédéric and it works fine. No additions to MCChain required anymore! Just including the sampler itself and the stats updates https://github.com/goedman/MCMC.jl. If you are ok with this version, I'll take out some of the extra diagnostics I inserted for testing purposes and can make a pull request.

papamarkou commented 10 years ago

Hi @goedman, sure, go ahead with the PR. I will have a look at the code, although it may not be a detailed review - I trust in you that you are familiar with the sampler and I will pay more attention to its integration with the rest of the package.

papamarkou commented 9 years ago

I think we can safely close this issue now. @goedman has already written a separate Julia package for calling Stan from Julia.