Here's an example memory breakdown showing a minimal DEzs MCMC configuration where the chains are of length 100k showing the mcmcOut object is using 1.2GB of memory:
mcmcOut = runMCMC(bayesianSetup = mcmcSetup, # 57 parameters
sampler = "DEzs",
settings = list(iterations = 300 * 1000, nrChains = 3))
# mcmcOut[[i]]$sampler, settings, setup, and X are small
tibble(totalMB = object.size(mcmcOut)[1] / 1E6, # 3 populations with 3 chains each
populationMB = object.size(mcmcOut[[1]])[1] / 1E6, # mostly chain, codaChain, and Z
chainMB = object.size(mcmcOut[[1]]$chain[[1]])[1] / 1E6, # 9 of these
codaChainMB = object.size(mcmcOut[[1]]$codaChain[[1]])[1] / 1E6, # 9 of these
zMB = object.size(mcmcOut[[1]]$Z)[1] / 1E6) # 3 of these
# A tibble: 1 x 5
totalMB populationMB chainMB codaChainMB zMB
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1213. 404. 45.6 45.6 130.
For the range of nrChains recommended with DEzs this implies memory requirements of 12-20 GB for each 1M iterations in the chains, consistent with memory use I'm measuring on other DEzs runs having chains up to length 2M. That suggests 300-500 GB for unthinned r3PG calibrations reaching chain lengths of 25M iterations.
Putting aside the use of sensitivity analysis to reduce the number of parameters, I think this raises three considerations within the BayesianTools package:
chain is subject to thinning but contains all iterations by default. At 25M iterations in this 57 parameter example, each unthinned chain instance would approach 12 GB. With 9-12 instances that's 110-150 GB. From code review, it looks to me like a possible workaround for DEzs callers which don't require blocking is to run a modified version of mcmcDEzs.R with the code around pChain removed. I haven't checked other samplers, though.
runMCMC() is hard coded to populate codaChain for DE and DEzs. Workaround is presumably to copy the function definition from mcmcRun.R in the package source and delete the codaChain = coda::mcmc(out$Draws) bit.
Z can be thinned as discussed in #179. However, a full Z is of same size as chain, meaning 85-90% thinning is needed to fit it on a typical machine with 16 GB DRAM. High DRAM cloud compute instances and placing independent chains on different machines provide partial workarounds. Those still leave the problem of bringing all the chains together to calculate diagnostics, though that's a bit easier to window off disk.
This leaves me with three questions, I think.
Are there better approaches to reducing the more or less triplicate memory consumption between chain, codaChain, and Z?
Is there some practical upper bound to thin? Using something like thin = 10 isn't ideal from an efficiency standpoint (90% of potential samples get dropped) but I'm not seeing obvious problems with, say, ergodicity.
Is there some alternate, better way of working with large chains that what's considered here?
The point about z-matrix is a duplicate of https://github.com/florianhartig/BayesianTools/issues/179. I agree there are various ways to address this, one thing I have personally always wondered is if we need z, given that it is largely identical to the chain. Anyway, comments about this should go to #179
Thinning of the chain is already implemented and I see no way to improve this.
The remaining point is to remove the coda chain - I just had a look at this, I think this was included at a time before getSample allowed coda, at least I had forgotten about it, and it seems redundant given that currently getSample is used also internally in all places where a coda object is needed.
=> Conclusion: rename this issue to remove coda chain, this can be implemented immediately.
Here's an example memory breakdown showing a minimal DEzs MCMC configuration where the chains are of length 100k showing the mcmcOut object is using 1.2GB of memory:
For the range of
nrChains
recommended with DEzs this implies memory requirements of 12-20 GB for each 1M iterations in the chains, consistent with memory use I'm measuring on other DEzs runs having chains up to length 2M. That suggests 300-500 GB for unthinned r3PG calibrations reaching chain lengths of 25M iterations.Putting aside the use of sensitivity analysis to reduce the number of parameters, I think this raises three considerations within the BayesianTools package:
chain
is subject to thinning but contains all iterations by default. At 25M iterations in this 57 parameter example, each unthinnedchain
instance would approach 12 GB. With 9-12 instances that's 110-150 GB. From code review, it looks to me like a possible workaround for DEzs callers which don't require blocking is to run a modified version of mcmcDEzs.R with the code aroundpChain
removed. I haven't checked other samplers, though.runMCMC()
is hard coded to populatecodaChain
for DE and DEzs. Workaround is presumably to copy the function definition from mcmcRun.R in the package source and delete thecodaChain = coda::mcmc(out$Draws)
bit.Z
can be thinned as discussed in #179. However, a fullZ
is of same size aschain
, meaning 85-90% thinning is needed to fit it on a typical machine with 16 GB DRAM. High DRAM cloud compute instances and placing independent chains on different machines provide partial workarounds. Those still leave the problem of bringing all the chains together to calculate diagnostics, though that's a bit easier to window off disk.This leaves me with three questions, I think.
chain
,codaChain
, andZ
?thin
? Using something likethin = 10
isn't ideal from an efficiency standpoint (90% of potential samples get dropped) but I'm not seeing obvious problems with, say, ergodicity.