audrey-b / sims

An R package to simulate data from JAGS or R Code
https://poissonconsulting.github.io/sims/
Other
1 stars 2 forks source link

How to set the seed for generating data? #3

Closed joethorley closed 5 years ago

joethorley commented 5 years ago

I've followed the documentation for jags but for some reason even setting .RNG.seed and .RNG.name in inits doesn't result in replicability when generating data.

audrey-b commented 5 years ago

Yup, I experienced the same problem, which package for jags did you use?

joethorley commented 5 years ago

rjags

joethorley commented 5 years ago

It works with R2jags which obviously uses rjags...

model <- "model{

  a0 ~ dnorm(0, 0.0001)
  a1 ~ dnorm(0, 0.0001)
  tau ~ dgamma(0.001,0.001)

  for (i in 1:100) {

    y[i] ~ dnorm(mu[i], tau)
    mu[i] <- a0 + a1 * x[i]
  }
}"

# make the data and prepare what we need to fit the model
x <- rnorm(100)
y <- 1 + 1.2 * x + rnorm(100)

data <- list("x", "y")
parameters <- c("a0", "a1", "tau")
inits = list(list(a0 = 1, a1=0.5, tau = 1))

library(R2jags)

# First fit
set.seed(121)
samples <- jags(data, inits, 
  parameters,model.file = textConnection(model),
  n.chains = 1, n.iter = 5000, n.burnin = 1, n.thin = 1)

# second fit
set.seed(121) # with set.seed at same value
samples2 <- jags(data, inits, 
  parameters,model.file = textConnection(model),
  n.chains = 1, n.iter = 5000, n.burnin = 1, n.thin = 1)

a0_1 <- samples$BUGSoutput$sims.list$a0

a0_2 <- samples2$BUGSoutput$sims.list$a0

head(cbind(a0_1, a0_2))

modified from https://stackoverflow.com/questions/45872938/how-can-i-set-a-random-seed-using-the-jags-function

joethorley commented 5 years ago

I've managed to establish that jags is only replicable when the priors are in the model block (as opposed to the data block)

https://github.com/poissonconsulting/bayesims/blob/master/tests/testthat/test-rjags.R

This is a strong argument for modifying in place in the model block

audrey-b commented 5 years ago

Wow I didn't think we'd be able to set the seed with set.seed for jags but that now makes our life a lot easier! It would be great to let the user choose which package they want for the analysis thought. This thread shows how to do it with runjags: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/359a4c90/ but it must be specified in the inits…

audrey-b commented 5 years ago

I've managed to establish that jags is only replicable when the priors are in the model block (as opposed to the data block) https://github.com/poissonconsulting/bayesims/blob/master/tests/testthat/test-rjags.R This is a strong argument for modifying in place in the model block

Ok so that means that our analyses are reproducible but not the simulated datasets?

Yeah we would definitely want the priors in the model block (this is how I set up my code originally).

I think that by specifying the likelihood and priors separately, it frees the user from having to think of what to put in a model vs data block (it gets taken care of by the package). When running simulations, one often wants to generate the data one way and analyse it another way to assess problems of model misspecifications. I think that it might be confusing for a user who wants to supply bugs code to simulate data (with no prior because they won't be reusing this code to analyse their data) to have to figure out if it should be in a model or data block...

audrey-b commented 5 years ago

https://stackoverflow.com/questions/52256845/runjags-set-seed-without-prior-for-simulations

I think it would be ok to use a model block to simulate data (as suggested above), as long as we only use 1 iteration and don't provide any inits. Even if the MCMC algorithm provides correlated samples from the likelihood distribution, if we only use one of those samples and do a for loop for however many samples are required, this should ensure the samples are independent.

joethorley commented 5 years ago

I've started a discussion on the JAGS source forge site about setting the seed for data block sampling

https://sourceforge.net/p/mcmc-jags/discussion/610036/thread/eb142380a4/#4c9c

audrey-b commented 5 years ago

I think that we should be able to use a model block to simulate data based on this: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/fd7f3f7e/#9bf5

joethorley commented 5 years ago

That is great!! Which means that we also deal with the problem of replicability

joethorley commented 5 years ago

The only question - I have is how can we sure the user is not providing data (and therefore affecting the parameters)?

audrey-b commented 5 years ago

The only question - I have is how can we sure the user is not providing data (and therefore affecting the parameters)?

Agreed. Potential solutions to think about:

  1. Force the model block to only use forward sampling. But is this possible? E.g. using unload.module or set.factory to disable some/all the samplers

  2. Before simulating data with a model block, pass the JAGS code to a data block for 1 iteration. If an error ensues, do not simulate data with the model block, otherwise, it is safe to simulate with the model block.

  3. Do a search in the BUGS code to make sure that none of the supplied variables arise to the left of a ~

joethorley commented 5 years ago
  1. would be ideal

  2. hacky but seems like it should/could work

  3. I like more than 2 the only issue is how to deal with hierarchical models?

audrey-b commented 5 years ago
  1. It doesn't seem to me that hierarchical models would be a problem

e.g. if this is a hierarchical model supplied by the user

X ~ dnorm(mu, sigma) mu ~ dnorm(mu0, sigma0)

then we would expect the user to supply mu0, sigma0 and sigma. If they try to supply mu then we would output an error message because we have "mu ~ " in the code. If they wanted to specify a value for mu, they would have to strip "mu ~ dnorm(mu0, sigma0)" off the code using our remove_prior function. Basically, the user would have to supply BUGS code for data simulation in the form of a likelihood that does not include any priors. Thus if the user ever tries to specify a value for a parameter that is before a "~" we would simply not allow it because for the purpose of data simulation it does not make sense to specify a value for something on which a distribution was defined.

Am I missing something here?

  1. Assuming I am right about 3., we could use 2. as a "check", i.e.

Do 3. If error then stop and print error. Do 2. for 1 iteration. If error then stop and print error. Use a model block for n.iter iterations.

joethorley commented 5 years ago

I think you are correct! As you suggest we can just check that no values to the right of a ~ are provided. I think this should be sufficient ie I don't think we don't need to do the additional check you suggest.

joethorley commented 5 years ago

do in model block and ensure no ~ are defined. can then use set.seed(). done