florianhartig / BayesianTools

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
https://cran.r-project.org/web/packages/BayesianTools/index.html
117 stars 29 forks source link

Writing a hierarchical model - unsure where to assign population-level parameters #270

Open josephlewis opened 6 months ago

josephlewis commented 6 months ago

Hi @florianhartig,

I'm trying to write a hierarchical model where I can estimate both population-level and individual-level parameters but I'm not sure how to do this. Namely, where do I assign the population-level prior? Before the likelihood function and then call the global variable in the likelihood function or within the function? Since this population-level prior will influence the individual-level prior I'm having difficulty with how to have the individual-level prior also be estimated and returned. Apologies for the lack of a reproducible example. Hopefully the provided information is sufficient. Thank you.

For example, this is my hierarchical model attempt. Here I can estimate the b_rate but I'm not sure how to also estimate and return the individual b_exp.

# this is the population-level prior for sd
b_rate <- param[1]
b_exp <- rexp(n = n, rate = b_rate)

# I also want to return the value for each individual, i.e. b1 and b2. so I know how each individual varies whilst also knowing the population-level parameter values so I can predict new groups.
b1 <- truncnorm::rtruncnorm(n = n, a = 0, mean = 3.5, sd = b_exp)
b2 <- truncnorm::rtruncnorm(n = n, a = 0, mean = 3.5, sd = b_exp)

Below are the mocked-up complete pooling and no pooling approaches if these help.

Complete pooling approach example:

density_1 = function(par){
  d1 = dlnorm(par[1], meanlog = log(3.5), sdlog = log(2), log = TRUE)
  d2 = dbeta(x = par[2], shape1 = 5, shape2 = 50, log = TRUE)
  return(d1+d2)
}

sampler_1 = function(n = 1){
  d1 = rlnorm(n, meanlog = log(3.5), sdlog = log(2))
  d2 = rbeta(n, shape1 = 5, shape2 = 50)
  return(cbind(d1, d2))
}

likelihood_1 <- function(param) { 

  b = param[1]
  c = param[2]

  ## process model and calculate log-likelihood
  # The same b and c parameters are used for modelling all the dataset. The process_model() function loops over individual subsets an sums the log-likelihoods.
  ll <- process_model(b,c)

  return(ll)
}

The no-pooling approach example:

density_2 = function(par){
  d1 = dlnorm(par[1], meanlog = log(3.5), sdlog = log(2), log = TRUE)
  d2 = dbeta(x = par[2], shape1 = 5, shape2 = 50, log = TRUE)
  d3 = dlnorm(par[3], meanlog = log(3.5), sdlog = log(2), log = TRUE)
  d4 = dbeta(x = par[4], shape1 = 5, shape2 = 50, log = TRUE)
  return(d1+d2+d3+d4)
}

sampler_2 = function(n = 1){
  d1 = rlnorm(n, meanlog = log(3.5), sdlog = log(2))
  d2 = rbeta(n, shape1 = 5, shape2 = 50)
  d3 = rlnorm(n, meanlog = log(3.5), sdlog = log(2))
  d4 = rbeta(n, shape1 = 5, shape2 = 50)
  return(cbind(d1, d2, d3, d4))
}

likelihood_2 <- function(param) { 

  b1 = param[1]
  c1 = param[2]

  b2 = param[1]
  c2 = param[2]

  ## process model and calculate log-likelihood
  # Different b and c parameters are used for modelling each 'group' of the dataset. The log-likelihoods are summed at the end. The process_model here differs from the complete pooling approach by only looping over one group of the dataset rather than all the groups.
  ll_1 <- process_model(b1,c1)
  ll_2 <- process_model(b2,c2)

  return(ll+ll2)
}