chjackson / msm

The msm R package for continuous-time multi-state modelling of panel data
https://chjackson.github.io/msm/
57 stars 16 forks source link

Specifying categorical type covariates in msm #80

Closed poradamc closed 9 months ago

poradamc commented 11 months ago

Dear Prof. Jackson, I have a couple of general questions with regards to adding covariates into the model.

(1) I have a number of categorical and ordinal variables I'd like to introduce into my model. I tried treating the ordinal variables as continuous and extracted the hazard ratios, however, it is not intuitive how one would interpret the hazard ratios in this case (i.e., if I have a variable on a Likert scale ranging from "Not very likely" to "Very likely" what does a HR value greater than 1 actually mean?) In the documentation, section 2.9 mentions how to set covariates values for categorical variables. However, my question is how do I go about assessing the effect of a variable with multiple categories? Would I need to assess the effect of each value for that variable? Or do I need to introduce n-1 dummy variables, n being equal to the number of categories?

(2) My original dataset has a number of missing values so I am using the package mice to perform a multiple imputation. However, I am not certain how to pool the results from the multiple imputed datasets. Would it make sense to take the average of all the transition intensities across all the imputed datasets for each transition and do the same for the hazard ratios as you would do in a regression model? Otherwise, could you direct me to any references for studies that used multiple imputation along with an msm model.

Thank you, Maria Porada

chjackson commented 11 months ago

These are all just general statistics questions, rather than anything specific to multistate models.

If you treat an ordinal variable as continuous, then the covariate effect is a comparison between two groups that are one category apart. This is assumed to be the same for all pairs of adjacent categories.

As long as your categorical variable is stored as a "factor" in your data, then msm will automatically set up the dummy variables you need to include it in the model, just like any other regression function. To assess whether a model with the categorical variable in fits better than a model without this variable in, you could fit both models and then do a likelihood ratio test between them.

I am not an expert on multiple imputation, and I don't know of any special considerations for doing it with multistate models. As I understood it, pooling using Rubin's rules works in the same way whatever the model - you average the estimate of interest over imputed datasets.

poradamc commented 11 months ago

Thank you very much, Prof. Jackson.