chjackson / msm

The msm R package for continuous-time multi-state modelling of panel data
https://chjackson.github.io/msm/
57 stars 16 forks source link

Implementing a semi-Markov model using 'msm' package #88

Closed kumarr14 closed 5 months ago

kumarr14 commented 9 months ago

Hi Dr. Jackson,

Thank you for all your efforts in making multi-state modeling so accessible for the user.

I am working with my colleague on a project using MSM, and we were hoping to implement a semi-Markov model. Is this possible within the 'msm' package? It seems from the 'msm' documentation it is possible, but it's not clear to us how to implement this type of model. Any guidance would be appreciated.

Thanks, Raj

chjackson commented 9 months ago

This is hard in principle for intermittently-observed data (what msm is designed for) because we don't observe the amount of time that someone has spent in a state, so it's hard to condition on it.

But a semi-Markov model can be implemented in principle as a hidden Markov model, using phase-type models. See the phase.states option to msm, and the rough worked example in this github issue comment. It has not been used much as far as I know though. These kinds of models will often be non-identifiable in practice due to lack of information in the data.

kumarr14 commented 9 months ago

Thank you for your reply, Dr. Jackson. We certainly don't want to run a model that is non-identifiable in practice.

To put my question in context of our research question: We are using a multistate model in the context of a cohort study with follow-up timepoints after an acute event (head trauma) at 1 year, 2 years, 5 years, and 10 years post-head trauma. When we ran a visual inspection of our MSM, it appeared that the Hazards are not constant over time, so we're unsure about assuming Markov properties. MSMmodel_Hazardplot_2023-12-19

Another alternate we have also discussed, besides the semi-Markov, would be to run a Markov model separately at the intermittent time intervals (i.e., models for the transition between year 1 to year 2, and year 2 to year 5, and year 5 to year 10), but we weren't sure if that is methodologically appropriate either. Any thoughts you have on how to proceed in this instance would be appreciated.

Thanks, Raj

chjackson commented 9 months ago

I can't tell whether the hazards are constant or not from that plot. It shows the probability of survival until time t, which by definition is a decreasing function of t. Are you confusing it with the probability of surviving until the end of a discrete time interval, conditionally on survival to the start of the interval?

Goodness of fit to the data is better assessed with prevalence.msm.

Perhaps a time-inhomogeneous model would be sufficient in your case, if the transition rates only depend on time through the time since entering the model: see the course notes. That is simpler than a semi-Markov model.

kumarr14 commented 8 months ago

Thank you for the recommendation, Dr. Jackson. We were able to run the time-inhomegenous Markov model, and the results seem to make sense. We also tested Goodness of fit (GOF) to the data with prevalence.msm, and the data appear to fit the model well. Is this GOF visual inspection sufficient to feel confident about the Markov assumption, or are there any other empirical data we can run for us to be comfortable with the Markov assumption?

chjackson commented 8 months ago

The Markov assumption is hard to assess directly if there is intermittent observation - because if we don't observe the full process history, it's hard to assess whether and how transitions depend on that history. If you suspect the Markov assumption doesn't hold for one particular state, then you might fit a phase-type model as originally suggested above, and compare likelihoods.

As a general check of fit, if one of your states is death and know the exact time of death, you could also compare Kaplan-Meier estimates of survival with probabilities of transition to the death state. See help(plot.survfit.msm) for a shortcut to doing this.