florianhartig / BayesianTools

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
https://cran.r-project.org/web/packages/BayesianTools/index.html
115 stars 29 forks source link

Blocked updates #209

Open florianhartig opened 3 years ago

florianhartig commented 3 years ago

Question from a user:

Is it possible to have parameter blocks and update them separately/sequentially? I am considering a model with random effects and considering all random effects as parameters result in a large parameter vector. I have explored with the Gibbs option. My random effects are multivariate vectors and I am trying to see if I can update them as blocks.

florianhartig commented 3 years ago

Yes, this should be possible. See, e.g., option "blockUpdate" in DEzs.

florianhartig commented 3 years ago

From the user:

Thank you. I checked the documentation and the blockUpdate option in DEzs is quite flexible. I do not see this option in other updating functions, such as in Metropolis.

florianhartig commented 3 years ago

Blocking strategies are depend on the sampler family. For the DE family, blocking is directly implemented in the sampler.

For the Metropolis, there is the option to supply a user-defined proposal function, which can be used to block. See https://rdrr.io/cran/BayesianTools/src/R/proposalGenerator.R

CasLudwig commented 2 years ago

Hi Florian,

I've been looking into blocking parameter updates, as I think that might be really useful for my application. I've mocked up a test below (this is not my actual application). This code simulates data from an ex-Gaussian density and then samples the posterior density of the three parameters (mu, sigma, beta) using DEzs.

When I try running the sampler with blocks based on the correlation between the parameter, the sampler crashes and I'm not sure why (the error suggests a NA/NaN argument, but typically after 2000+ samples). In any event, I'm more interested in defining my own groups. So in the code below, group 1 = c(mu, sigma); group 2 = beta. My blockUpdate options are then as follows:

blockUpdate = list("user", groups = c(1,1,2), pSel = 1, groupStart = 101, groupIntervall=500)

As I understand it, this should create my two groups; it will only ever update one of the two groups (pSel = 1); it starts on sample 101 and one block is updated for a period of 500 samples. The last value is deliberately large to try to figure out what the blocking is doing from the traceplot.

What I expected to see is that one group of parameters would be stationary for a period of 500 samples, during which the other group would jump around; and vice versa. However, that does not seem to happen in the traceplot below.

I suspect I've misunderstood something and probably the way I've set up the blockUpdate is wrong. Any pointers will be very welcome. Incidentally, do I understand correctly that blocking is not available in DREAMzs?

Best, Cas

traceExample

library(BayesianTools)
library(brms) # For ex-Gaussian density

N <- 10000
parmNames <- c("mu", "sigma", "beta")
exGaussParms <- c(0.3,0.05,0.2)
exGaussData <- data.frame(y=rexgaussian(N, exGaussParms[1], exGaussParms[2], exGaussParms[3]))

ll <- function(p){
  ll <- dexgaussian(exGaussData$y,p[1],p[2],p[3],log=TRUE)
  return(sum(ll))
}

bayesianSetup <- createBayesianSetup(likelihood = ll, 
                                     lower = rep(0.01,3), 
                                     upper = rep(1,3),
                                     names = parmNames,
                                     parallel = TRUE)
settings <- list(iterations = 9000, 
                 burnin=3000, 
                 message = TRUE
)

mcmcOut1 <- runMCMC(bayesianSetup = bayesianSetup,
                   sampler="DEzs",
                   settings = settings)

# Take a look at the chains and the correlations. Note that mu and beta in particular are highly correlated.
tracePlot(mcmcOut1, start = 2)
correlationPlot(mcmcOut1, start=2)

# Try blocked updating. We define two groups of parameters: (mu, sigma) and beta. 
settings <- list(iterations = 9000, 
                 burnin=3000, 
                 message = TRUE,
                 blockUpdate = list("user", groups = c(1,1,2), pSel = 1, groupStart = 101, groupIntervall=500)
)

mcmcOut2 <- runMCMC(bayesianSetup = bayesianSetup,
                   sampler="DEzs",
                   settings = settings)

tracePlot(mcmcOut2, start = 2)
correlationPlot(mcmcOut2, start=2)

dev.off()
florianhartig commented 2 years ago

Hi Cas,

thanks for reporting this. I will have to look at this, but I'm sorry to say that it may take more than a few days because of time limitations and the need to prioritise issues for BT, and this is not a super central feature.

Just a quick question: are you sure the blocking would be useful for your case, i.e. is an unblocked DEzs inefficient? In my (limited) experience, blocking is mainly useful in practice to block fixed / random effects or similar. I'm just curious in which context you expect it to be of particular use.

Cheers, Florian

CasLudwig commented 2 years ago

Hi Florian,

Thanks - of course, I totally appreciate time constraints and varying BT priorities!

I'm not entirely sure blocking is what I need. I'm assessing the identifiability of my model with some model recovery simulations (actually, this is still a reduced version of the model I will eventually work with). I find that if I estimate all parameters together, many parameters are biased. However, if I group the parameters in a certain way and estimate each group of parameters in turn (holding the other group to their true values), recovery is spot on. That made me think that blocked parameter updates might be useful here, although I appreciate the situation is quite different if you cannot fix one group to the true values...

Best, Cas

florianhartig commented 2 years ago

Hi Cas,

that sounds odd - provided that you run the samplers long enough, all options should result in the same posterior. Blocking could only improve the speed of convergence.

Did you check (using convergence checks) which of the two options has actually converged? It may also be that the blocking option is the wrong one.

In general, you should evaluate the MCMCs (and settings) by the speed of convergence, not if they give you the right parameters. That the right parameters are retrieved is not guaranteed in a calibration, the wrong parameters may be the right answer (based on the way the likelihood / priors are specified).

Cheers, F

CasLudwig commented 2 years ago

Hi Florian,

Sorry for the delayed response; I took some time off over Easter and managed to avoid looking at mail.

The model I'm working with involves probability density approximation of the likelihood and parameters are strongly correlated. I find that chains tend to remain stationary for relatively long periods of time, so I use pretty long chains (typically 50k samples/chain after burn-in; no blocking). After some thinning all the convergence statistics and the traceplots look ok, but the parameters are biased.

It is entirely possible that I'm getting these biased estimates because of the nature of the data (simulated data at this point). This is one of the things I need to check. Before that, I thought I'd check whether and which parameters can be recovered accurately in groups. In these checks I'm not using the blocking functionality of DEzs, but rather I keep some parameters at their true values and estimate the remaining parameters together (actually, using DREAMzs). To me that suggested that blocking might be helpful, but I appreciate there is a big difference between keeping one group of parameters at their true values (as I have done in this recovery simulation) and keeping one group at their current position in the chains (as you would do in a real application).

From what you're saying it sounds like blocking is unlikely to resolve the bias in my estimates, in which case I can abandon this line of exploration. I would still be interested to see what happens with this functionality as anything that helps the speed of convergence would also be very welcome...

Thanks again and best wishes. Cas

florianhartig commented 2 years ago

Hi Cas,

when you say biased, are you looking at the marginal peaks or at the MAP? Note that in the presence of non-linear correlations between parameters, there is a difference, see below.

image

From what you say, there are a number of reasons for your observations of biased estimates, including:

I would exclude one by one to make sure that the sampler is really the problem. If that is the case, blocking could be an option but there are also other means to improve mixing.

CasLudwig commented 2 years ago

Hi Florian,

I'm looking at the marginals and for the biased parameters most of the marginal density lies either above/below the true value. Incidentally, I note that others who work with the same or similar models also get similar biases (when they report recovery simulations), although they tend to be a bit smaller than the ones I'm getting.

I'm pretty sure the likelihood itself is not biased (I can actually check this for the simple version of the model I work with and I did this at some point) and I have quite a lot of data... I had not quite appreciated the difference between joint vs marginal in the presence of non-linear correlations and I'll look into that. I'll let you know if I gain any further insights!

Best, Cas

florianhartig commented 2 years ago

Hi Cas,

you can get the MAP from the MCMC chain via MAP(). This is an approximate value, as MCMC sampling is not particularly efficient in getting the exact location of the MAP, but with near certainty better than using the marginal modes. Check if using the MAP improves things.

About the likelihood: not sure how you can be certain that the likelihood is not biased, or maybe we misunderstand each other regarding what we mean by bias. What I mean is that the MLE (and thus also the MAP) is usually only asymptotically unbiased. With realistic data sizes, the MLE is often biased, especially for variance parameters.

Cheers, Florian

CasLudwig commented 2 years ago

Hi Florian,

Thanks - will check that MAP() function.

Ah yes, we had misunderstood each other. I meant that the approximate likelihood (through kernel density estimation of simulated data) is an unbiased estimator of the "true" likelihood for a given set of parameters. That is, the simplified version of the model I'm working with has an analytically defined likelihood, and so for that model I could compare the approximate likelihood with its true value. From memory, the distribution of approximate likelihoods (same parameters, but different runs of the likelihood computation) was nicely centred on the true value.

But you're right of course: bias in the MAP (or marginal) estimates may be an issue with realistically sized datasets. At least in this recovery simulation, I should be able to assess this by varying the size of the simulated dataset.

Best, Cas