Closed joethorley closed 5 years ago
Yup, I experienced the same problem, which package for jags did you use?
rjags
It works with R2jags which obviously uses rjags...
model <- "model{
a0 ~ dnorm(0, 0.0001)
a1 ~ dnorm(0, 0.0001)
tau ~ dgamma(0.001,0.001)
for (i in 1:100) {
y[i] ~ dnorm(mu[i], tau)
mu[i] <- a0 + a1 * x[i]
}
}"
# make the data and prepare what we need to fit the model
x <- rnorm(100)
y <- 1 + 1.2 * x + rnorm(100)
data <- list("x", "y")
parameters <- c("a0", "a1", "tau")
inits = list(list(a0 = 1, a1=0.5, tau = 1))
library(R2jags)
# First fit
set.seed(121)
samples <- jags(data, inits,
parameters,model.file = textConnection(model),
n.chains = 1, n.iter = 5000, n.burnin = 1, n.thin = 1)
# second fit
set.seed(121) # with set.seed at same value
samples2 <- jags(data, inits,
parameters,model.file = textConnection(model),
n.chains = 1, n.iter = 5000, n.burnin = 1, n.thin = 1)
a0_1 <- samples$BUGSoutput$sims.list$a0
a0_2 <- samples2$BUGSoutput$sims.list$a0
head(cbind(a0_1, a0_2))
modified from https://stackoverflow.com/questions/45872938/how-can-i-set-a-random-seed-using-the-jags-function
I've managed to establish that jags is only replicable when the priors are in the model block (as opposed to the data block)
https://github.com/poissonconsulting/bayesims/blob/master/tests/testthat/test-rjags.R
This is a strong argument for modifying in place in the model block
Wow I didn't think we'd be able to set the seed with set.seed for jags but that now makes our life a lot easier! It would be great to let the user choose which package they want for the analysis thought. This thread shows how to do it with runjags: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/359a4c90/ but it must be specified in the inits…
I've managed to establish that jags is only replicable when the priors are in the model block (as opposed to the data block) https://github.com/poissonconsulting/bayesims/blob/master/tests/testthat/test-rjags.R This is a strong argument for modifying in place in the model block
Ok so that means that our analyses are reproducible but not the simulated datasets?
Yeah we would definitely want the priors in the model block (this is how I set up my code originally).
I think that by specifying the likelihood and priors separately, it frees the user from having to think of what to put in a model vs data block (it gets taken care of by the package). When running simulations, one often wants to generate the data one way and analyse it another way to assess problems of model misspecifications. I think that it might be confusing for a user who wants to supply bugs code to simulate data (with no prior because they won't be reusing this code to analyse their data) to have to figure out if it should be in a model or data block...
https://stackoverflow.com/questions/52256845/runjags-set-seed-without-prior-for-simulations
I think it would be ok to use a model block to simulate data (as suggested above), as long as we only use 1 iteration and don't provide any inits. Even if the MCMC algorithm provides correlated samples from the likelihood distribution, if we only use one of those samples and do a for loop for however many samples are required, this should ensure the samples are independent.
I've started a discussion on the JAGS source forge site about setting the seed for data block sampling
https://sourceforge.net/p/mcmc-jags/discussion/610036/thread/eb142380a4/#4c9c
I think that we should be able to use a model block to simulate data based on this: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/fd7f3f7e/#9bf5
That is great!! Which means that we also deal with the problem of replicability
The only question - I have is how can we sure the user is not providing data (and therefore affecting the parameters)?
The only question - I have is how can we sure the user is not providing data (and therefore affecting the parameters)?
Agreed. Potential solutions to think about:
Force the model block to only use forward sampling. But is this possible? E.g. using unload.module or set.factory to disable some/all the samplers
Before simulating data with a model block, pass the JAGS code to a data block for 1 iteration. If an error ensues, do not simulate data with the model block, otherwise, it is safe to simulate with the model block.
Do a search in the BUGS code to make sure that none of the supplied variables arise to the left of a ~
would be ideal
hacky but seems like it should/could work
I like more than 2 the only issue is how to deal with hierarchical models?
e.g. if this is a hierarchical model supplied by the user
X ~ dnorm(mu, sigma) mu ~ dnorm(mu0, sigma0)
then we would expect the user to supply mu0, sigma0 and sigma. If they try to supply mu then we would output an error message because we have "mu ~ " in the code. If they wanted to specify a value for mu, they would have to strip "mu ~ dnorm(mu0, sigma0)" off the code using our remove_prior function. Basically, the user would have to supply BUGS code for data simulation in the form of a likelihood that does not include any priors. Thus if the user ever tries to specify a value for a parameter that is before a "~" we would simply not allow it because for the purpose of data simulation it does not make sense to specify a value for something on which a distribution was defined.
Am I missing something here?
Do 3. If error then stop and print error. Do 2. for 1 iteration. If error then stop and print error. Use a model block for n.iter iterations.
I think you are correct! As you suggest we can just check that no values to the right of a ~ are provided. I think this should be sufficient ie I don't think we don't need to do the additional check you suggest.
do in model block and ensure no ~ are defined. can then use set.seed(). done
I've followed the documentation for jags but for some reason even setting .RNG.seed and .RNG.name in inits doesn't result in replicability when generating data.