chjackson / msm

The msm R package for continuous-time multi-state modelling of panel data
https://chjackson.github.io/msm/
57 stars 16 forks source link

Guidance Request on Addressing Transition Probability Estimation Issue in msm Package #85

Closed Tuotuo6613 closed 8 months ago

Tuotuo6613 commented 9 months ago

description

Hi Prof. Jackson, I would like to express my sincere gratitude for developing the msm package and providing free courses on its usage. Your work has been immensely beneficial to me and many others in the field. I have been using the msm package to fit a disease progression model with three states: 1 (Healthy), 2 (Disabled), and 3 (Deceased). The model allows transitions between states 1 and 2, 2 and 1, 1 and 3, as well as 2 and 3. However, I am encountering persistent issues with estimating the transition probability from state 1 to 3 . Despite adjusting the Q-matrix, the issue persists. I am wondering if this could be due to a limited number of samples for the 1 to 3 state transition? I would greatly appreciate any guidance or advice you could provide. Attached are my data, code, and the hazard plot for your reference. Thank you for taking the time to read my message, and I look forward to any insights you may be able to offer.

data:

image

code:

statetable.msm(state, id, data = longdata01) to from 1 2 3 1 10050 2552 4802 2 357 481 1294 Q <- rbind(c(0, 0.01, 0.01), c(0.01, 0, 0.01), c(0, 0, 0)) model1 <- msm(state~age, subject = id, data =longdata01,center = FALSE, qmatrix = Q, death = 3, control = list(reltol = 1e-8, maxit = 1000, fnscale = 100000)) summary(model1) $prevalences $prevalences$Observed State 1 State 2 State 3 Total 0 208 0 0 208 5.6 1242 34 17 1293 11.2 1775 112 254 2141 16.8 1769 125 659 2553 22.4 1649 230 1448 3327 28 1697 363 2799 4859 33.6 786 283 4230 5299 39.2 513 133 5398 6044 44.8 102 66 5947 6115 50.4 6 3 6086 6095 56 0 0 6096 6096

$prevalences$Expected State 1 State 2 State 3 Total 0 208.000000 0.000000 0.0000 208 5.6 532.606020 270.617659 489.7763 1293 11.2 400.089268 237.889911 1503.0208 2141 16.8 219.823453 133.598569 2199.5780 2553 22.4 132.305251 80.669336 3114.0254 3327 28 89.273843 54.458351 4715.2678 4859 33.6 44.982848 27.442162 5226.5750 5299 39.2 23.705906 14.462140 6005.8320 6044 44.8 11.081745 6.760594 6097.1577 6115 50.4 5.103464 3.113450 6086.7831 6095 56 2.358392 1.438775 6092.2028 6096

$prevalences$Observed percentages State 1 State 2 State 3 0 100.00000000 0.00000000 0.000000 5.6 96.05568445 2.62954370 1.314772 11.2 82.90518449 5.23120037 11.863615 16.8 69.29103016 4.89620055 25.812769 22.4 49.56417193 6.91313496 43.522693 28 34.92488166 7.47067298 57.604445 33.6 14.83298736 5.34063031 79.826382 39.2 8.48775645 2.20052945 89.311714 44.8 1.66802944 1.07931316 97.252657 50.4 0.09844135 0.04922067 99.852338 56 0.00000000 0.00000000 100.000000

$prevalences$Expected percentages State 1 State 2 State 3 0 100.00000000 0.00000000 0.00000 5.6 41.19149421 20.92943995 37.87907 11.2 18.68702791 11.11115884 70.20181 16.8 8.61039768 5.23300308 86.15660 22.4 3.97671329 2.42468697 93.59860 28 1.83728838 1.12077281 97.04194 33.6 0.84889315 0.51787435 98.63323 39.2 0.39222214 0.23928094 99.36850 44.8 0.18122232 0.11055754 99.70822 50.4 0.08373198 0.05108203 99.86519 56 0.03868753 0.02360195 99.93771

$hazard NULL

$hazard.scale NULL

attr(,"class") [1] "summary.msm"

model2 <- msm(state~age, subject = id, data =longdata01,center = FALSE,

  • qmatrix = Q, death = 3,
  • covariates = ~qishi2,
  • control = list(reltol = 1e-8, maxit = 1000, fnscale = 100000)) summary(model2) $prevalences $prevalences$Observed State 1 State 2 State 3 Total 0 208 0 0 208 5.6 1242 34 17 1293 11.2 1775 112 254 2141 16.8 1769 125 659 2553 22.4 1649 230 1448 3327 28 1695 363 2799 4857 33.6 785 283 4228 5296 39.2 512 133 5395 6040 44.8 102 66 5943 6111 50.4 6 3 6082 6091 56 0 0 6092 6092

$prevalences$Expected State 1 State 2 State 3 Total 0 208.000000 0.000000 0.0000 208 5.6 543.339984 266.865326 482.7947 1293 11.2 416.145690 236.316743 1488.5376 2141 16.8 229.167696 133.018483 2190.8138 2553 22.4 139.462971 80.435753 3107.1013 3327 28 96.812675 54.895307 4705.2920 4857 33.6 50.563339 28.172051 5217.2646 5296 39.2 28.079515 15.335922 5996.5846 6040 44.8 13.964462 7.482737 6089.5528 6111 50.4 6.899563 3.632028 6080.4684 6091 56 3.445888 1.784910 6086.7692 6092

$prevalences$Observed percentages State 1 State 2 State 3 0 100.00000000 0.000000 0.000000 5.6 96.05568445 2.629544 1.314772 11.2 82.90518449 5.231200 11.863615 16.8 69.29103016 4.896201 25.812769 22.4 49.56417193 6.913135 43.522693 28 34.89808524 7.473749 57.628166 33.6 14.82250755 5.343656 79.833837 39.2 8.47682119 2.201987 89.321192 44.8 1.66912126 1.080020 97.250859 50.4 0.09850599 0.049253 99.852241 56 0.00000000 0.000000 100.000000

$prevalences$Expected percentages State 1 State 2 State 3 0 100.00000000 0.00000000 0.00000 5.6 42.02165385 20.63923631 37.33911 11.2 19.43697760 11.03768067 69.52534 16.8 8.97640799 5.21028137 85.81331 22.4 4.19185365 2.41766617 93.39048 28 1.99326076 1.13023074 96.87651 33.6 0.95474582 0.53194961 98.51330 39.2 0.46489263 0.25390599 99.28120 44.8 0.22851354 0.12244702 99.64904 50.4 0.11327471 0.05962942 99.82710 56 0.05656415 0.02929925 99.91414

$hazard $hazard$qishi2 HR L U State 1 - State 2 1.2894486 1.220837503 1.361916 State 1 - State 3 0.1043775 0.003037333 3.586914 State 2 - State 1 0.7783026 0.622040005 0.973820 State 2 - State 3 0.9939162 0.938083903 1.053071

$hazard.scale [1] 1

attr(,"class") [1] "summary.msm"

hazard plot:

image
chjackson commented 9 months ago

It's not clear to me why you think there is a problem - what exactly is the quantity being plotted, and how it was it produced from the msm output? If there are covariates in the model, then when you produce any output such as a transition probability, you should think about what covariate values you want the output for.

I can see a wide confidence interval for the estimate of the effect of the qishi2 covariate on the 1-3 transition intensity. So presumably there is no information in the data about that effect, and you might consider removing that parameter from the model (see https://chjackson.github.io/msm/msmcourse/covariates.html#choices-for-modelling-covariates ).

Tuotuo6613 commented 9 months ago

Thank you very very much for your prompt and helpful response.:) The variable 'qishi2' in my analysis is a binary variable representing an impact on health status, which I am keenly interested in studying. My objective is to compare the health state transition probabilities between groups with high and low 'qishi2' values. To this end, I have created a plot to visualize these comparisons.

However, I have encountered a challenge. After experimenting with various covariates, I noticed that all of them, on the 1-3 transition intensity, yield very wide confidence intervals. This has left me somewhat perplexed and led me to suspect that there might be an underlying issue in my approach or analysis. Perhaps I should remove all covariates from the 1-3 transition process?

I would greatly appreciate any insights or suggestions you might have regarding this matter. Below, I have included the plotting code I used for your reference:

model31 <- msm(state~age, subject = id, data =longdata01,center = FALSE, qmatrix = Q, death = 3,

  • covariates = ~age+ qishi2+edu+juzhudi+a1+duanlian+hs_manbing+hunyin, control = list(reltol = 1e-8, maxit = 1000, fnscale = 100000))

par(bg = "white")

transitions <- matrix(c(1, 2,

  • 1, 3,
  • 2, 1,
  • 2, 3), ncol = 2, byrow = TRUE)

    qishi2=0(blue)

    lty_values <- c(4, 3, 1, 5) haz0 <- hazards(model31,

  • b.covariates = list(age=0,qishi2=0,juzhudi=1.5,a1=1.5,duanlian=1.5,edu=1.2,hs_manbing=0.7,hunyin=1.7),
  • no.years = 35, max.haz = 1, age.shift = -65,
  • trans = transitions,
  • col = rep("blue", nrow(transitions)),
  • lty = lty_values,
  • LEGEND = FALSE)

par(new = TRUE)

qishi2=1(black)

haz1 <- hazards(model31,

  • b.covariates = list(age=0,qishi2=1,juzhudi=1.5,a1=1.5,duanlian=1.5,edu=1.2,hs_manbing=0.7,hunyin=1.7),
  • no.years = 35, max.haz = 1, age.shift = -65,
  • trans = transitions,
  • col = rep("black", nrow(transitions)),
  • lty = lty_values,
  • LEGEND = FALSE) legend_labels_transitions <- c("1->2", "1->3", "2->1", "2->3") legend_labels_groups <- c("low", "high") legend("topright",
  • legend = c(legend_labels_transitions, legend_labels_groups),
  • lty = c(lty_values, rep(NA, 2)),
  • pch = c(rep(NA, 4), 15, 15),
  • col = c(rep("blue", 4), "blue", "black"),
  • cex = 0.8)
image
chjackson commented 9 months ago

hazards isn't a function in the msm package, so it's still not clear. Though given what you have shown, I suspect it would be wise to remove the covariate from the 1-3 transition rate. This is not necessarily making a strong assumption - because these are continuous-time models. The covariate will still affect the probability of transitioning between 1 and 3 over an interval of time, because it is still allowed to affect the 1-2 and 2-3 rates.

Tuotuo6613 commented 9 months ago

Oh, my apologies, indeed that is a function from the ELECT package. Thank you very much for taking the time to answer my question . I appreciate your help and will continue to study this content carefully. Wishing you a pleasant day :) thank you again 🙏☕️