Question: How to assess or use the model

Derry-L commented 1 year ago

Dear Chris, Ahead of my question, please accept my sincere gratitude for all your work on the 'msm' package and its user manual, which helped me so much on my ongoing project. After solving all "converge failures", "numeric overflows", and "Hessian matrix not positive defined" problems by reading the manual and git issues, I have finally come to the stage of assessing the model. So mainly I have 2 questions here: (Please forgive me if my questions are "stupid" since I do badly in statistics for real)

(My model is fitted as a normal multi-state Markov model with no misclassification. The msm function has run successfully without returning any errors and warnings, thanks again to your effort on building and guiding of msm)

1 assessing the whole model

->pearson.msm(mydata.msm)

     stat p df.lower p.lower df.upper p.upper
 671.4378 0      129       0      192       0

I don't know how to interpret the output.

How do I know if stat(671) means "significant(good?)" or not?
Why are p, p.lower, and p.upper all zero? Are they actually zero (<0.05) so I can say the model is doing well, or something went wrong in the model that p-related values are set to 0?
If df means the degree of freedom, why does df is a range instead of a point? It's worth noting that all my data's observation times are fixed values like day 1, day 4, day 7, day 14, and day 21. I know the manual says the data should be observations of the process at arbitrary times which differ between individuals. It's sad I still use this method because I don't know another way to assess my model, either for the whole one or each covariate.

2 using the model

Let's assume my model has a good assessing result in the derivation cohort.

How can I apply it to a new individual to see his probable progression of the disease? (Like drawing a graph showing all states' probabilities as time goes by)
How can I validate my model in an individual cohort?

3 sharing of experience to all other users

This part is for other users who might have also been bothered by warnings given by msm(). I would like to share my experience of solving "converge failures", "numeric overflows", and "Hessian matrix not positive defined" problems, though it's actually already written in the manual. To Build a multi-state Markov model with no misclassification well, as to my data, the most important thing is to define the true possible transitions. In other words, give 0 values in the Q matrix thoughtfully. Setting the right 0s is more important than changing any control parameters because it sets the model's most basic shape. Be sure that you know which type your data is, so that the right form of Q matrix can be given, or another form of Markov model can be applied. For more details, read the manual '2.12 Convergence failure'. (It's a perfect manual indeed!)

Sorry for bothering and looking forward to your kind reply! Derry Liu 2023.3.6

chjackson commented 1 year ago

The Pearson test is a test of the hypothesis that the data were generated by the fitted model. High test statistic with low p-value means that the hypothesis is rejected. Personally, I've never found this test very useful. In practice, all models are wrong - so as the sample size gets bigger, the p-value will always get lower. Then if the hypothesis is rejected, it doesn't tell you what is wrong with the model, or how far the models' predictions are from the truth.

Check the help page for pearson.msm for the technical details of the output.

I find the prevalence plots (prevalence.msm) to be more helpful. In your case, there is no problem if everyone is observed at the same time. That's actually beneficial, because it makes the "observed prevalences" in this method easier to compute accurately.

Most output functions (e.g. pmatrix.msm) have an argument covariates that you can supply to make predictions for individuals with specific characteristics.

There isn't a specific function in the package for validating against a dataset other than the one used to fit the model. You could code this yourself by making a prediction for each of the distinct covariate values observed in the external dataset, then calculating an weighted average, weighted by the frequency of each covariate pattern.

Thanks for sharing your experience - I agree with this!

Derry-L commented 1 year ago

Thank you very much for your detailed and quick reply! Best wishes!😄

Derry-L commented 1 year ago

Sorry to bother you again. A new problem came up in my mind when I was simplifying my model. How can I check the covariates' effect on staying in one state？ It's like, I'm deleting covariates whose all CI intervals of Hazard Ratio involve 1. But suddenly I realize that maybe they shouldn't be deleted because they may have an effect on staying in a certain state. Take "cav" data in the manual for example:

> hazard.msm(cavsex.msm)
$sex
HR L U
State 1 - State 2 0.5632779042 3.333382e-01 9.518320e-01
State 1 - State 4 1.1289701413 6.261976e-01 2.035418e+00
State 2 - State 1 1.2905853501 4.916004e-01 3.388139e+00
State 2 - State 3 1.0765518296 5.193868e-01 2.231408e+00
State 2 - State 4 0.0003804824 7.241465e-65 1.999137e+57
State 3 - State 2 1.0965531163 1.345395e-01 8.937364e+00
State 3 - State 4 2.4135379727 1.176293e+00 4.952139e+00

The output tells us only the effect on transitions between states but no effect on staying in one state. (e.g., state 1- state 1) So, should I delete a covariate whose all CI intervals include 1, or keep them in case they might have an effect on staying in one state? If they do might have an effect on staying in one state, how can I check this effect?

Best wishes. Derry Liu 2023.3.6

chjackson commented 1 year ago

If a covariate increases the risk of moving from state r to a different state, then that automatically implies that it reduces the risk of staying in state r, through how the model is parameterised.

Derry-L commented 1 year ago

So in the following example, is keeping admission_Ascites in the model acceptable because it might increase the risk of staying at state r, as it doesn't significantly increase the risk of any transitions? I don't know if I'm right, because I'm not sure if this is sufficient and necessary: showing no significance on increasing transitions means increasing the risk of staying.

$admission_Ascites
                         HR         L        U
State 1 - State 2 1.0314815 0.4828285 2.203586
State 2 - State 1 0.5195860 0.1812957 1.489112
State 2 - State 3 1.3347384 0.6097087 2.921931
State 3 - State 1 1.4003416 0.7873194 2.490675
State 3 - State 2 1.8187937 0.5126175 6.453175
State 3 - State 4 0.7441341 0.3830849 1.445464
State 3 - State 5 1.0104385 0.3493113 2.922854
State 4 - State 3 0.8796667 0.2232397 3.466291
State 4 - State 5 0.6931040 0.3338162 1.439095

chjackson commented 1 year ago

If a covariate doesn't affect the risk of moving from a state, that implies that it doesn't affect how long somebody stays in the state.

Derry-L commented 1 year ago

Oh yes of course! My bad XD. Thanks a lot for your patience!! It's touching that my silly questions can be answered by excellent researchers like you. This encourages me a lot. I hope that one day I'll be sitting in the Cambridge campus and taking your courses as myself! Though it's hard to get there from a Chinese college, I'll definitely do my best.

Best wishes.

chjackson / msm