florianhartig / BayesianTools

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
https://cran.r-project.org/web/packages/BayesianTools/index.html
115 stars 29 forks source link

Fitting large chains into physical memory => remove coda chain slot in BayesianOutput #227

Closed twest820 closed 2 years ago

twest820 commented 2 years ago

Here's an example memory breakdown showing a minimal DEzs MCMC configuration where the chains are of length 100k showing the mcmcOut object is using 1.2GB of memory:

mcmcOut = runMCMC(bayesianSetup = mcmcSetup, # 57 parameters
                  sampler = "DEzs",
                  settings = list(iterations = 300 * 1000, nrChains = 3))
# mcmcOut[[i]]$sampler, settings, setup, and X are small
tibble(totalMB = object.size(mcmcOut)[1] / 1E6, # 3 populations with 3 chains each
       populationMB = object.size(mcmcOut[[1]])[1] / 1E6, # mostly chain, codaChain, and Z
       chainMB = object.size(mcmcOut[[1]]$chain[[1]])[1] / 1E6, # 9 of these
       codaChainMB = object.size(mcmcOut[[1]]$codaChain[[1]])[1] / 1E6, # 9 of these
       zMB = object.size(mcmcOut[[1]]$Z)[1] / 1E6) # 3 of these
# A tibble: 1 x 5
  totalMB populationMB chainMB codaChainMB   zMB
    <dbl>        <dbl>   <dbl>       <dbl> <dbl>
1   1213.         404.    45.6        45.6  130.

For the range of nrChains recommended with DEzs this implies memory requirements of 12-20 GB for each 1M iterations in the chains, consistent with memory use I'm measuring on other DEzs runs having chains up to length 2M. That suggests 300-500 GB for unthinned r3PG calibrations reaching chain lengths of 25M iterations.

Putting aside the use of sensitivity analysis to reduce the number of parameters, I think this raises three considerations within the BayesianTools package:

  1. chain is subject to thinning but contains all iterations by default. At 25M iterations in this 57 parameter example, each unthinned chain instance would approach 12 GB. With 9-12 instances that's 110-150 GB. From code review, it looks to me like a possible workaround for DEzs callers which don't require blocking is to run a modified version of mcmcDEzs.R with the code around pChain removed. I haven't checked other samplers, though.
  2. runMCMC() is hard coded to populate codaChain for DE and DEzs. Workaround is presumably to copy the function definition from mcmcRun.R in the package source and delete the codaChain = coda::mcmc(out$Draws) bit.
  3. Z can be thinned as discussed in #179. However, a full Z is of same size as chain, meaning 85-90% thinning is needed to fit it on a typical machine with 16 GB DRAM. High DRAM cloud compute instances and placing independent chains on different machines provide partial workarounds. Those still leave the problem of bringing all the chains together to calculate diagnostics, though that's a bit easier to window off disk.

This leaves me with three questions, I think.

  1. Are there better approaches to reducing the more or less triplicate memory consumption between chain, codaChain, and Z?
  2. Is there some practical upper bound to thin? Using something like thin = 10 isn't ideal from an efficiency standpoint (90% of potential samples get dropped) but I'm not seeing obvious problems with, say, ergodicity.
  3. Is there some alternate, better way of working with large chains that what's considered here?
florianhartig commented 2 years ago

Yes, good points ...

=> Conclusion: rename this issue to remove coda chain, this can be implemented immediately.