fintzij / stemr

Fit Stochastic Epidemic Models via Bayesian Data Augmentation
https://fintzij.github.io/stemr/
8 stars 6 forks source link

prevalence-based observations #13

Closed chvandorp closed 4 years ago

chvandorp commented 4 years ago

Hi Jon,

Thanks for fixing the issue with lists of data sets. I warned you I found another problem, though.. Here it is: I'm trying to fit a model with one incidence-based and a prevalence-based observation. This did not work, and I think this is because something goes wrong with the prevalence-based observations and indexing of paths. So I tried an example with only a prevalence-based observation. But this resulted in other errors. When I run the script below, I get the error:

Error in stem_measure(emissions = emissions, dynamics = dynamics, messages = TRUE) : 
  object 'incidence_codes_lna' not found

If you (un)comment the lines with (OK)ERROR, you'll see that the script works with incidence-based observations.

Thanks again!

Chris

library(stemr)
library(ggplot2)
library(gridExtra)

N <- 1e5
true_pars <- c(beta=1, gamma=0.5, theta=0.2, N=N)

compartments <- c("S", "I", "R")

rates <- list(
  rate(rate="beta * I/N",
       from="S",
       to="I",
       incidence=FALSE), ## ERROR
#       incidence=TRUE), ## OK
  rate(rate="gamma",
       from="I",
       to="R",
       incidence=FALSE)
)

epsilon <- 1e-4 ## inoculum (fraction of popsize)

## Q: why is this a list??
state_initializer <- list(
  stem_initializer(
    init_states = c(
      S = N * (1-epsilon), 
      I = N * epsilon,
      R = 0
    ),
    fixed = TRUE)
)

# only estimate beta and gamma
par_names <- c("beta", "gamma")
constants <- c(t0=0, true_pars[!names(true_pars) %in% par_names])
params <- true_pars[par_names]

tmax <- 35

# compile the model
dynamics <- stem_dynamics(
  rates = rates,
  parameters = params,
  tmax = tmax,
  state_initializer = state_initializer,
  compartments = compartments,
  constants = constants,
  compile_ode = TRUE,  
  compile_rates = TRUE, 
  compile_lna = TRUE,
  messages = TRUE
)

obstimes.cases <- seq(1,tmax) ## daily death counts

emissions <- list(
  emission(meas_var = "cases",
           distribution    = "poisson",
           emission_params = c("I * theta"), incidence = FALSE, ## ERROR
#           emission_params = c("S2I * theta"), incidence = TRUE, ## OK
           obstimes        = obstimes.cases)
)

measurement_process <- stem_measure(
  emissions = emissions,
  dynamics  = dynamics,
  messages  = TRUE
)

stem_object <- make_stem(
  dynamics = dynamics,
  measurement_process = measurement_process
)

########### simulate data using the MJP ############

sim_mjp <- simulate_stem(
  stem_object=stem_object, 
  method="gillespie", 
  messages=TRUE
)

sim_data <- as.data.frame(sim_mjp$datasets[[1]])

ggplot(data=sim_data, aes(x=time, y=cases)) + geom_point()

####### create a new measurement process using the simulated data #######

measurement_process <- stem_measure(
  emissions = emissions,
  dynamics  = dynamics,
  data      = sim_data[c("time", "cases")],
  messages  = TRUE
)

stem_object <- make_stem(
  dynamics = dynamics,
  measurement_process = measurement_process
)

############## fit model to simulated data #############

priors <- list(
  logprior = function(params_est){return(0.0)}, ## flat prior
  to_estimation_scale = log,
  from_estimation_scale = exp
)

mcmc_kern <- mcmc_kernel(
  parameter_blocks = list(
    parblock(
      pars_nat = par_names,
      pars_est = c("log_beta", "log_gamma"),
      priors = priors,
      alg = "mvnmh",
      sigma = diag(0.01, length(par_names)),
      control = mvnmh_control(stop_adaptation = 2.5e2)
    )
  ), 
  lna_ess_control = lna_control(bracket_update_iter = 50)
)

res <- fit_stem(
  stem_object = stem_object,
  method = "lna",
  mcmc_kern = mcmc_kern,
  thinning_interval = 10,
  iterations = 2e3,
  print_progress = 10
)

posterior = res$results$posterior # list with posterior objects

############ plot trajectories ###############

## NB: only works for incidence-based measurements

num_samples <- dim(posterior$latent_paths)[3]

cases.plot <- ggplot(data=sim_data, aes(x=time, y=cases)) +
  geom_point()

for ( i in seq(num_samples/2, num_samples, 10) ) {
  traj <- as.data.frame(posterior$latent_paths[,,i])
  traj$expect_cases <- traj$S2I * true_pars["theta"]
  cases.plot <- cases.plot + geom_line(data=traj, aes(x=time, y=expect_cases), 
                                       alpha=0.2, color='green')
}

cases.plot
fintzij commented 4 years ago

Hmm, I was able to fix the error in stem_measure. I think there may be some downstream issues though. Might take a few days to track this down.

Thanks for the heads up, really appreciate that you're exploring some use cases that I hadn't worked on!

chvandorp commented 4 years ago

OK good luck! It's quite a complex and ambitious package, so not surprising that some parts don't fully work yet... Thanks for developing this!

fintzij commented 4 years ago

Hey Chris,

Apologies for the delays. I pushed a patch that should fix the issues in simulate_stem with prevalence data. I think it should be good to go for the model in your example above. However, I think this function is long overdue to be refactored and I'm not entirely sure it's bug free. In particular, while I'm confident that epidemic paths are being simulated correctly, there might be issues when simulating datasets conditional on epidemic paths in cases where you have misaligned observation intervals like in the example in your last issue. In these cases, you could solve ODEs or get exact paths via Gillespie simulation, but simulate datasets by hand. I think the inference wrappers should be fine for prevalence data.

Thanks again for letting me know about issues that you've run into. The package is still being actively developed so it's great to have feedback.

Cheers, Jon

chvandorp commented 4 years ago

Fantastic, Jon! I'll soon give it a try with some real data.