TuringLang / AbstractMCMC.jl

Abstract types and interfaces for Markov chain Monte Carlo methods
https://turinglang.org/AbstractMCMC.jl
MIT License
87 stars 18 forks source link

Add MCMCSerial #71

Closed cpfiffer closed 3 years ago

cpfiffer commented 3 years ago

I often run into the case where I want to get multiple chains but don't want to go through the hassle of making things threadsafe or available to each process. The common way to do this currently is something like

chains = map(x -> sample(model, spl, n), 1:10) # sample ten chains
chain = reduce(chainscat, chains)

Which is fine, but it is odd that we don't have a convenient way of sampling multiple chains in a basic way.

I added MCMCSerial as a parallelization subtype, so you can now simply call

chain = sample(model, spl, MCMCSerial(), n, 10)

which will sample each subchain in turn and concatenate them. Tests are included.

I'm also open to changing the call here -- it seems to me that this should be the default behavior of something like

sample(model, spl, n, n_chains)

where no parallization method is specified. I haven't implemented the signature above but I can if people think it's a good idea.

codecov[bot] commented 3 years ago

Codecov Report

Merging #71 (48ba9b0) into master (a089476) will increase coverage by 0.08%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
+ Coverage   97.48%   97.57%   +0.08%     
==========================================
  Files           7        7              
  Lines         199      206       +7     
==========================================
+ Hits          194      201       +7     
  Misses          5        5              
Impacted Files Coverage Δ
src/AbstractMCMC.jl 100.00% <100.00%> (ø)
src/sample.jl 97.93% <100.00%> (+0.08%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a089476...48ba9b0. Read the comment docs.

cpfiffer commented 3 years ago

I'm not completely sure, I thought that maybe rather MCMCThreads should be the default - e.g., it is the default context in JuliaFolds and also in other packages such as DataFrames and CSV multithreading is the default option.

Okay, let's leave it be for now -- I don't think we should use threads as the default because we have so little control over the downstream operation. Lots of things are not threadsafe still and MCMC models can have weird things in them that don't lend themselves to thread parallelization well. We'll stick with the "specify a method" thing for now, and then we can gauge later who is using what as the default method.