Closed AlexisZr closed 1 year ago
There is no way to do this, because it fundamentally does not make sense. The Cox regression model is used to perform G-computation, as described in the documentation (see ?surv_direct
). This entails that predictions for subject-specific survival probabilities are made, conditional on the value of variable
. If variable
is not included in the model, this cannot work.
But your issue is a very valid one, you want to include a continuous predictor in the Cox model and want to display the adjusted effect of this continuous predictor using something similar to a Kaplan-Meier curve. I developed a different R-package specifically for this purpose called contsurvplot
(https://cran.r-project.org/package=contsurvplot), which does exactly what you truly want without the need for arbitrary categorisation. How it works is documented in my preprint on the subject (https://arxiv.org/abs/2208.04644v1), which has been recently accepted by "Epidemiology" (just so you know it's already peer-reviewed).
If you have any questions on this, I am happy to help.
Hey Robin thanks for the prompt response. Actually the arbitrary categorization that I introduce in the issue is quite meangiful for my dataset, as it allows me to categorize the patients in two groups (coined High and low expression level). Is the contsurvplot
offering a specific function that allows me to depict one or more groups based on a categorical variable that is not included in the cox model ? ggadjustedcurves
used to have a similar functionality according to the survminer
cheatsheet which apparently is not supported any more (last figure in the cheatsheet).
Thanks again in advance for your prompt response.
Categorising continuous variables is almost never a good idea. I know that it is a convenient way to obtain two survival curves (one for the "high" and one for the "low" level), but it also tends to introduce bias and usually leads to a significant loss of power, as described in the preprint I linked. This gets more obvious if you think about the assumption you are making when you create that grouping variable. You assume that the survival probability of someone with a prot_expr
value of prot_cutpoint
- 0.0000000001 is completely different than the survival probability of someone with a prot_expr
value of exactly prot_cutpoint
, while you also assume that there is no change between a prot_expr
of prot_cutpoint
+ 10 or prot_cutpoint
+ 100 (I am making up values since I don't know what it measures, but you get the point).
The only reason to dichotomize prot_expr
would be if you have an actual very good theoretical justification for why the survival probability suddenly changes at prot_cutpoint
and stays constant elsewhere, which is very rarely the case. Usually continuous have smooth relationships with the survival probability, which can be captured using something like a Cox model.
You have two options:
1.) Categorise prot_expr
and use that categorised variable in your cox model instead of the continuous version. You can then use adjustedsurv
or ggadjustedcurves
just as you are used to. Again, this only makes sense if categorising makes sense.
2.) Use the model you currently have (which could also be changed to include some non-linear effects of prot_expr
using splines or polynomial terms) and use the contsurvplot
package to directly depict the effect of prot_expr
as a continuous variable on the time-to-event outcome.
In my opinion, the second option is much better, but the decision is up to you.
Yes thanks for the delineation. I forgot to mention that the categorical dichotomization of the continous variable prot_expr
was based on the assumption that there is a certain protein expression level (measured in mean fluorescent intensity) that allows us to discriminate between patients with lower and higher survivability. Indeed if you perform the cox regression model utilizing the categorical variable, namely expr_lvl
instead of the prot_expr
( The prot_cutpoint
is estimated with surv_cutpoint
) there is a statistical significance. Which later on allows us to hypothesize that patients with high expression and lets say lower survivability are a better target group for early treatment. Thats why I wanted to know if that contsurvplot offers this option.
Following your suggestion, contsurvplot (specifically plot_surv_area()
) indeed offers a great way to visualize the correlation between protein expression level and survivability. Is there a way to specify the breaks in the bins ?
e.g. 2 bins
(min_exp,prot_cutpoint],[prot_cutpoint,max_exp)
Yes you can do that. Simply pass c(min_exp,prot_cutpoint,max_exp)
to the horizon
argument and set discrete=TRUE
(to get a better plot legend) and it should work fine.
The function effectively conveys the message that I intend to present. I greatly appreciate your valuable input and the ingenuity of your software package. However, I have a more theoretical question (kinda escapes the contsurvplot
visualization) that arose while working with your plotting package. Would it be possible to contact you via email to discuss this matter further and avoid overextending this completed issue?
Of course. Just send me an e-mail to robin.denz@rub.de and I will try to answer your questions to the best of my ability. I will close this issue now.
I have a coxph model e.g. :
Generate a grouping variable based on the protein expression (prot_expr) - high vs low expression
I want the
variable
argument to be theexpr_lvl
:Unfortunatelly when I run the plot command I get this error:
Is there a way to bypass this issue ?