RobinDenz1 / adjustedCurves

An R-Package to estimate and plot confounder-adjusted survival curves (single event survival data) and confounder-adjusted cumulative incidence functions (data with competing risks) using various methods.
https://robindenz1.github.io/adjustedCurves/
GNU General Public License v3.0
36 stars 4 forks source link

`adjustedsurv()` : Is it possible the grouping factor (i.e. variable argument) to not be part of the cox-regression model ? #9

Closed AlexisZr closed 1 year ago

AlexisZr commented 1 year ago

I have a coxph model e.g. :

fit <- coxph(Surv(survival_months, died) ~ prot_expr + gender + disease_onset + age_at_onset, data = df)

Generate a grouping variable based on the protein expression (prot_expr) - high vs low expression

min_exp <- min(df$prot_expr)
max_exp <- max(df$prot_expr)
df$expr_lvl <- cut(df$prot_expr, c(min_exp,prot_cutpoint,max_exp))

I want the variable argument to be the expr_lvl :

survival_est <- adjustedsurv(data=df ,outcome_model= fit,ev_time="survival_months", event="died",variable ="expr_lvl",method="direct",conf_int=TRUE)

Unfortunatelly when I run the plot command I get this error:

Error in check_inputs_adjustedsurv(data = data, variable = variable, ev_time = ev_time,  : 
  'variable' has to be included in the cox-regression model.

Is there a way to bypass this issue ?

RobinDenz1 commented 1 year ago

There is no way to do this, because it fundamentally does not make sense. The Cox regression model is used to perform G-computation, as described in the documentation (see ?surv_direct). This entails that predictions for subject-specific survival probabilities are made, conditional on the value of variable. If variable is not included in the model, this cannot work.

But your issue is a very valid one, you want to include a continuous predictor in the Cox model and want to display the adjusted effect of this continuous predictor using something similar to a Kaplan-Meier curve. I developed a different R-package specifically for this purpose called contsurvplot (https://cran.r-project.org/package=contsurvplot), which does exactly what you truly want without the need for arbitrary categorisation. How it works is documented in my preprint on the subject (https://arxiv.org/abs/2208.04644v1), which has been recently accepted by "Epidemiology" (just so you know it's already peer-reviewed).

If you have any questions on this, I am happy to help.

AlexisZr commented 1 year ago

Hey Robin thanks for the prompt response. Actually the arbitrary categorization that I introduce in the issue is quite meangiful for my dataset, as it allows me to categorize the patients in two groups (coined High and low expression level). Is the contsurvplot offering a specific function that allows me to depict one or more groups based on a categorical variable that is not included in the cox model ? ggadjustedcurves used to have a similar functionality according to the survminer cheatsheet which apparently is not supported any more (last figure in the cheatsheet). Thanks again in advance for your prompt response.

RobinDenz1 commented 1 year ago

Categorising continuous variables is almost never a good idea. I know that it is a convenient way to obtain two survival curves (one for the "high" and one for the "low" level), but it also tends to introduce bias and usually leads to a significant loss of power, as described in the preprint I linked. This gets more obvious if you think about the assumption you are making when you create that grouping variable. You assume that the survival probability of someone with a prot_expr value of prot_cutpoint - 0.0000000001 is completely different than the survival probability of someone with a prot_expr value of exactly prot_cutpoint, while you also assume that there is no change between a prot_expr of prot_cutpoint + 10 or prot_cutpoint + 100 (I am making up values since I don't know what it measures, but you get the point).

The only reason to dichotomize prot_expr would be if you have an actual very good theoretical justification for why the survival probability suddenly changes at prot_cutpoint and stays constant elsewhere, which is very rarely the case. Usually continuous have smooth relationships with the survival probability, which can be captured using something like a Cox model.

You have two options:

1.) Categorise prot_expr and use that categorised variable in your cox model instead of the continuous version. You can then use adjustedsurv or ggadjustedcurves just as you are used to. Again, this only makes sense if categorising makes sense.

2.) Use the model you currently have (which could also be changed to include some non-linear effects of prot_expr using splines or polynomial terms) and use the contsurvplot package to directly depict the effect of prot_expr as a continuous variable on the time-to-event outcome.

In my opinion, the second option is much better, but the decision is up to you.

AlexisZr commented 1 year ago

Yes thanks for the delineation. I forgot to mention that the categorical dichotomization of the continous variable prot_expr was based on the assumption that there is a certain protein expression level (measured in mean fluorescent intensity) that allows us to discriminate between patients with lower and higher survivability. Indeed if you perform the cox regression model utilizing the categorical variable, namely expr_lvl instead of the prot_expr ( The prot_cutpoint is estimated with surv_cutpoint ) there is a statistical significance. Which later on allows us to hypothesize that patients with high expression and lets say lower survivability are a better target group for early treatment. Thats why I wanted to know if that contsurvplot offers this option.

AlexisZr commented 1 year ago

Following your suggestion, contsurvplot (specifically plot_surv_area()) indeed offers a great way to visualize the correlation between protein expression level and survivability. Is there a way to specify the breaks in the bins ? e.g. 2 bins

(min_exp,prot_cutpoint],[prot_cutpoint,max_exp)
RobinDenz1 commented 1 year ago

Yes you can do that. Simply pass c(min_exp,prot_cutpoint,max_exp) to the horizon argument and set discrete=TRUE (to get a better plot legend) and it should work fine.

AlexisZr commented 1 year ago

The function effectively conveys the message that I intend to present. I greatly appreciate your valuable input and the ingenuity of your software package. However, I have a more theoretical question (kinda escapes the contsurvplot visualization) that arose while working with your plotting package. Would it be possible to contact you via email to discuss this matter further and avoid overextending this completed issue?

RobinDenz1 commented 1 year ago

Of course. Just send me an e-mail to robin.denz@rub.de and I will try to answer your questions to the best of my ability. I will close this issue now.