allow conditional sampling in 'sim.fmsm()' ?

kkmann commented 1 year ago

Hi,

this is related to #150 - I am trying to sample forward from the predictive distribution of an individual given current state and sojourn time in that state for a multistate model.

sim.fmsm() only seems to support sampling conditional on last state but cannot accept the respective sojourn time ('start' paramter in simulate.flexsurvreg()).

Would this be feasible to add to sim.fmsm()? I suspect the critical lines are around

https://github.com/chjackson/flexsurv-dev/blob/96dfe2ea0a360d4a79996e3b390d7cd27e0ab265/R/mstate.R#L862

kkmann commented 1 year ago

@danielinteractive, you might find this interesting too.

kkmann commented 1 year ago

Here is a suggestion (admittedly super inefficient) of how this could look like. One of the many problems is that it is super slow to not preallocate the return vecors. Maybe chunked pre-allocation could work. Ideally, the functionality could be integrated in the sim.fmsm() function though.

library(flexsurv)

mod_nobos_bos <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==1),
                             data = bosms3, dist = "weibull")
mod_nobos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==2),
                               data = bosms3, dist = "weibull")
mod_bos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==3),
                             data = bosms3, dist = "weibull")

tmat <- rbind(c(NA, 1, 2), c(NA, NA, 3), c(NA, NA, NA))

mdl <- fmsm(mod_nobos_bos, mod_nobos_death, mod_bos_death, trans = tmat)

sim <- function(mdl, newdata = data.frame(..null = NA), start = 0, nsim = 1L) {
  as.numeric(simulate(mdl, newdata = newdata, nsim = nsim)[1, 1:nsim])
}

simulate_fmsm <- function(mdl, state, newdata = data.frame(..null = NA),
                          start = 0, tmax = Inf) {
  tmat <- attr(mdl, "trans")
  terminal_states <- as.integer(which(rowSums(!is.na(tmat)) == 0))
  from <- integer()
  to <- integer()
  t <- numeric()
  tt <- 0
  while (!(state %in% terminal_states) & (tt < tmax)) {
    # sample next state
    from <- c(from, state)
    next_states <- as.integer(which(!is.na(tmat[state, ])))
    next_trans_ids <- tmat[state, next_states]
    # sample from transitions
    times <- numeric(length(next_states))
    for (i in seq_along(next_trans_ids)) {
      trans <- next_trans_ids[i]
      times[i] <- sim(mdl[[i]], newdata = newdata, start = start, nsim = 1L)
    }
    state <- next_states[which.min(times)]
    to <- c(to, state)
    tt <- tt + min(times)
    t <- c(t, tt)
    start <- sqrt(.Machine$double.eps) # future states are censored at 0
  }
  # combine vectors into a data frame and return
  res <- cbind(from, to, t)
  return(res)
}

simulate_fmsm(mdl, state = 1)

chjackson commented 1 year ago

sim.fmsm simulates the entire pathway through a multistate model though. Do I understand correctly that you want to simulate data where each individual spends a time period of at least "start" in every one of their states, so that their sojourn times follow standard parametric distributions that are left-truncated by this time? That seems quite specialised. What is the application?

kkmann commented 1 year ago

No, only the current state ('start' in sim.fmsm) sojourn would be needed. Imagine simulating forward for each individual at an interim analysis in a CT setting. For simulate.flexsurvreg() this is possible via the start parameter (see #150) but sim.fmsm() seems to use a slightly different mechanism for sampling (or I missed the point) and I couldn't find a way to specify the time already spent in the starting state. Also, the 'start' paramter has a different meaning in sim.fmsm(). I think my example code achieves what I want (going directly via simulate.flexsurvreg()) but to me the sim.fmsm() function would be the natural place for this functionality.

Edit: also interesting link to #38.

chjackson / flexsurv

allow conditional sampling in 'sim.fmsm()' ? #152