florianhartig / BayesianTools

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
https://cran.r-project.org/web/packages/BayesianTools/index.html
119 stars 29 forks source link

details on how to write a sampler that works on different numbers of parameters #137

Open garyappleseed opened 6 years ago

garyappleseed commented 6 years ago

I'm new to BayesianTools. I want to have a prior that combines two uniform distributions with a number k of Laplace distributions, where k can change from run to run. I set up the density to do this: density = function(par){ d1 = dunif(par[1], -20,20, log =TRUE) d2 = dunif(par[2], -20,20, log =TRUE) d3 = sum(dlaplace(par[-c(1,2)], 0, 45, log =TRUE)) return(d1 + d2 + d3) } Here dlaplace is from the LaplacesDemon package, and par is a vector of any length > 2.

But I don't know how to write the sampler without knowing the length, since the sampler does not take parameters. Can I just make the length a global constant?

florianhartig commented 6 years ago

Hi,

can I ask: what's the purpose of this, i.e. why don't you know how many parameters you have? BT generally assumes you know how many parameters your model has.

side note: to write a double exponential (Laplace) you don't need an extra package, its just 0.5 * dexp(abs(x), lambda)

Best, F

garyappleseed commented 6 years ago

I’m trying BayesianTools to do linear models with shrinkage - similar to Bayesian Lasso - but for residual distributions that do not have closed-form densities and are not in Stan. One is the tweedie, which has its own R package for the density, and another in the Poisson-Inverse Gaussian, which is closed form with modified Bessel functions but these need specialized software as well in that for large x they cannot be calculated in double-precision math - they might need up to 40 or 50 digits.

Unlike lasso, Bayesian shrinkage does not eliminate variables, but a lot can get coefficients near zero. So they can be removed from the design matrix and the model estimated again. This might take several runs to get the right model. So I don’t want to make a new prior every time as the design matrix gets smaller - or sometimes bigger if I put one back in.

I’ve tried it by making the number of columns a global constant, and that seems to work. I don’t know if that will affect parallelization, though. I haven’t tried that yet. I don’t know if I have it automatically or not. I’m using DEzs and the summary says it is running 3 chains for each chain I call, but when I use activity monitor on the Mac it shows that only one core is being used. The quad-core Mac has 4 more virtual cores, so 8 total, so it would be great to be able to make parallel chains..

In Stan I use looic instead of waic. It seems like a better measure. Did you see their papers on Pareto-smoothed importance sampling? I am planning to use it for BayesianTools as well. It uses a separate package, loo, that just needs a matrix with a row for each sample and a column for each data point, with the logliikelihoods of the points in the sample. I haven’t figured out exactly how to make that in BayesianTools, but it must be pretty easy.

Good point about the Laplace. Probably easier that way.

I don’t know enough about making a prior. The example code from the package pdf looks like the sampler puts out a matrix even though there is only one row. Maybe it has to be that way. I did that in a pretty artificial way - see below, where U is the number of columns in the design matrix. But I don’t know if I had to or not or if what I did is good enough. But it gives MCMC output. I’m not an R expert either.

sampler = function(n=1){ d = 1:(U+2) d[1] = runif(n, -20,20) d[2] = runif(n, -20,20) for(j in 3:(U+2)) d[j] = rlaplace(n, 0,0.175) return(t(d)) }

What I really want to do, and do in Stan, is make the Laplace lambda, here 0.175, also stochastic. I am assuming I could do something like d[j] = rlaplace(n, d[1]). Would that work in BayesianTools?

So far it seems like a great package. I’ve only done trial models where I use the gamma distribution for residuals, but it runs pretty fast. The coefficients are more different than I would expect from what I got in Stan for the same model, but I am still working out kinks in it.

Best, Gary

On Jun 7, 2018, at 6:58 AM, Florian Hartig notifications@github.com wrote:

Hi,

can I ask: what's the purpose of this, i.e. why don't you know how many parameters you have? BT generally assumes you know how many parameters your model has.

side note: to write a double exponential (Laplace) you don't need an extra package, its just 0.5 * dexp(abs(x), lambda)

Best, F

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/florianhartig/BayesianTools/issues/137#issuecomment-395380038, or mute the thread https://github.com/notifications/unsubscribe-auth/AjdSt26YbCK-0JWjMV3j1wCW9edGwv8aks5t6QdsgaJpZM4UdWcx.

florianhartig commented 6 years ago

Hi Gary,

thanks for the explanation - lots of topics. I try to respond point by point

  1. The original question: personally, I would not throw out parameters unless you really have a computational / runtime problem. What's the harm in keeping them all in? Moreover, consider that a parameter may be at zero at some point during the MCMC chain, but if the chain is not yet through the burn-in phase (which is in general hard to know) the parameter might still move if other parameter move

  2. If you absolutely have to change parameters, you will have to create a new BayesianSetup. Depending on your R skills, I would either create a new setup and start the DEzs with a z-matrix of the old MCMC (easier), or actually manipulate the sampler output, which contains the setup and the sampler by replacing prior, likelihood, and erasing the respective columns from the MCMC chain, which would allow a true restart of the sampler with less parameters. But again, I'd try to avoid option 2.

  3. Parallelization -> will only work if you switch it on in the setup - did you do that? See help in the setup and vignette. We extended the help in the development version, so if you are using the CRAN version, maybe install the development version. Parallelization will use up to 3 cores per chain for the DEzs, so if you want to use all your cores, you can start several DEzs at the same time and combine them later. Be advised that in particular the MBPs are not designed to run under full load for a long time though.

  4. waic / loo -> why do you need this if you do lasso / shrinkage? This should have taken care of the model selection

  5. prior sampler -> not sure if I understand this correctly. In BT, the sampler is only needed to create start values. Maybe what you want is to have the parameter of the prior be again a parameter, i.e. you want a hyperprior on the parameter? If that is the case, you need to set the hyperprior in the prior, and the rest in the likelihood. See an example for a mixed model, which is essentially the same, here https://github.com/florianhartig/BayesianTools/blob/master/Examples/RandomEffectsModel.md

garyappleseed commented 6 years ago

Ok - I think you can close the original issue,

I don’t need to get rid of variables during the MCMC run. After the run, I take out the variables with parameters close to zero, so I have fewer variables in the design matrix, and so need a new prior for the next run. But this seems to work ok if I set the number of columns in the design matrix to a global variable U, and refer to U in the sampler.

But I use looic to see if leaving out the variables made the fit worse. If so I put some back. Otherwise I look for others to take out. Often taking out the variables with wide ranges and low means for the parameter actually improves looic, I also want to use looic to compare the fit from different residual distributions,

It is a hyper prior I’m looking for. I changed the density and sampler to make the first parameter the variance of the parameters that have Laplace priors. It seems to work.. Code below. Did have to change the likelihood a little.

I tried just parallelization = TRUE. It got some error having to do with logfitness. I don’t really need it at this point as it is pretty fast for what I am doing now, though.

I didn’t see where you input starting values. I tried using best = , in both the createPrior and createBayesianSetup, but if I left out the sampler I got an error about it.

Here are the density and sampler that worked for a hyper prior, called s. The second and third parameters are for the distribution or the constant, and are not shrunk. I like to make the prior of the log uniform for a positive parameter. The first one is the hyper prior for shrinkage:

densitys = function(par){ d1 = dunif(par[1], -5,0, log =TRUE) d2 = dunif(par[2], -20,20, log =TRUE) d3 = dunif(par[3], -20,20, log =TRUE) s = exp(par[1]) d4 = sum(dlaplace(par[-c(1,2,3)], 0, s, log =TRUE)) return(d1 + d2 + d3 + d4) }

samplers = function(n=1){ d = 1:(U+3) d[1] = runif(n, -5,0) d[2] = runif(n, -20,20) d[3] = runif(n, -20,20) s = exp(d[1]) for(j in 4:(U+3)) d[j] = rlaplace(n, 0,s) return(t(d)) }

Thank you for your help on all this, and having patience with a new user.

On Jun 7, 2018, at 5:31 PM, Florian Hartig notifications@github.com wrote:

Hi Gary,

thanks for the explanation - lots of topics. I try to respond point by point

The original question: personally, I would not throw out parameters unless you really have a computational / runtime problem. What's the harm in keeping them all in? Moreover, consider that a parameter may be at zero at some point during the MCMC chain, but if the chain is not yet through the burn-in phase (which is in general hard to know) the parameter might still move if other parameter move

If you absolutely have to change parameters, you will have to create a new BayesianSetup. Depending on your R skills, I would either create a new setup and start the DEzs with a z-matrix of the old MCMC (easier), or actually manipulate the sampler output, which contains the setup and the sampler by replacing prior, likelihood, and erasing the respective columns from the MCMC chain, which would allow a true restart of the sampler with less parameters. But again, I'd try to avoid option 2.

Parallelization -> will only work if you switch it on in the setup - did you do that? See help in the setup and vignette. We extended the help in the development version, so if you are using the CRAN version, maybe install the development version. Parallelization will use up to 3 cores per chain for the DEzs, so if you want to use all your cores, you can start several DEzs at the same time and combine them later. Be advised that in particular the MBPs are not designed to run under full load for a long time though.

waic / loo -> why do you need this if you do lasso / shrinkage? This should have taken care of the model selection

prior sampler -> not sure if I understand this correctly. In BT, the sampler is only needed to create start values. Maybe what you want is to have the parameter of the prior be again a parameter, i.e. you want a hyperprior on the parameter? If that is the case, you need to set the hyperprior in the prior, and the rest in the likelihood. See an example for a mixed model, which is essentially the same, here https://github.com/florianhartig/BayesianTools/blob/master/Examples/RandomEffectsModel.md https://github.com/florianhartig/BayesianTools/blob/master/Examples/RandomEffectsModel.md — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/florianhartig/BayesianTools/issues/137#issuecomment-395571894, or mute the thread https://github.com/notifications/unsubscribe-auth/AjdSt8fTnJg8LrsS4WXtEqoCTztzpXAIks5t6Zu3gaJpZM4UdWcx.

florianhartig commented 6 years ago
  1. looic - OK, I see, this is like the typical frequentist lasso cross-validation. From what I read, this is unusual for a Bayesian analysis, where the penalty would usually be tuned via a hyperprior, and thus no additional check is needed (supposedly), but why not.

  2. Restarting the sampler: you don't have to restart the z-matrix, you can also create a new one for the prior, but if you want to restart, see https://github.com/florianhartig/BayesianTools/issues/79

  3. I would have written everything but the lowest-level priors in the likelihood, but why not, it doesn't really matter if you put this in the likelihood or the prior. However, note you need dexo / rexp