CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.34k stars 553 forks source link

Fixed coefficient for a covariate #974

Open nyousefi2020 opened 4 years ago

nyousefi2020 commented 4 years ago

Hi, thanks a lot for this useful and easy-to-use package! I'm doing some sensitivity analysis on my survival model, and for that I need to fix the coefficient of one covariate (=confounder/factor) meaning that I do not need to estimate the hazard ratio for that variable but still need to account for that with a known ratio. Can you please advice how I can perform this using lifelines? Thanks

CamDavidsonPilon commented 4 years ago

Hi @nyousefi2020,

I hope I can help. Can you tell me what regression model you are using?

nyousefi2020 commented 4 years ago

Thanks. I use Cox Proportional Hazard model (cph()).

CamDavidsonPilon commented 4 years ago

Unfortunately that's not available in lifelines yet. For later addition as a feature though, It would for me to understand your problem more. Can you explain how you might perform the sensitivity analysis after fixing the hazard? Also, how do you select what variables to fix vs not fix?

nyousefi2020 commented 4 years ago

Sure. To make sure that we have considered all the relevant variables in the model, we assume that there is an unmeasured variable which is correlated with the treatment variable (the variable of interest) plus affects the outcome through the hazard model. So, we generate an unmeasured variable 'U' based on a formula and add it to the model. However, we don't want to predict its hazard ratio, we like to manually set a range of hazard ratios (e.g. -2 to +2) to see what happens to the coefficients of the other variables, specially the treatment variable. Hope I could explain it. It is now a popular topic in causal inference. Here are some references for more details: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3073860/ https://link.springer.com/article/10.1007/s10654-013-9770-6 https://arxiv.org/abs/1908.01444

pzivich commented 4 years ago

Hey @nyousefi2020 it sounds like you want to apply quantitative bias analysis (maybe a probabilistic bias analysis). From the approaches that I know, you don't need to set the hazard ratio to be a constant within the regression itself. AFAIK those papers linked don't mention the approach you suggest. I would especially be careful with regards to the direct/indirect/total effect paper since that is a much different estimand than most survival analyses. The good news is that there are approaches for unmeasured confounder bias analysis that should be possible to implement in lifelines. These methods only require simulation of a variable.

Suppose there is an unmeasured binary confounder, U. What you need is to assign the probability of U within strata of the treatment A. Probability of U should differ between strata of A. Next you will need to stipulate the relationship between U and the time-to-event. This is less easy to do for survival sensitivity analyses, so instead you can use the combination of disease and exposure to determine the probability of having the unknown confounder (see Lash & Fink citation). After generating U, you would run the Cox PH model but include U in the model. You would not specify the parameter within the model. It should be allowed to vary in the procedure. This process is commonly built on by instead of using a single value for all the parameters, a distribution is used instead. Many people use a trapezoidal distribution in quantitative bias analyses, but the choice is up to you.

I would recommend Lash & Fink 2003 for an applied example. They also include SAS code, which could probably be adapted for your purposes. The link to the code in the paper is dead, but you can pull the SAS code from https://sites.google.com/site/biasanalysis/multiple-bias-model-lash-tl-fink-a Lines 248-260 are the relevant parts

For applied examples of this approach, see Albert et al. 2012, and Corrao et al. 2014.

nyousefi2020 commented 4 years ago

Thanks @pzivich! I'm a real beginner in survival analysis and the related material, just started using lifelines last week. The references are really helpful. I have actually created the U variable. What I did was to generate the binary U variable for the strata of the treatment and also relate it to death/censored. Do you think this is enough? But then I thought I should also fix the hazard ratio, but I was obviously wrong. Thanks again for helping me out. Nasrin

pzivich commented 4 years ago

No problem! For a simple bias analysis, inclusion of one variable (particular a variable that is thought to be an important confounder) is good enough. It is often more than most analyses include. You can build more complex quantitative bias analyses (like including U1, U2, ..., Un) but these often require many assumptions between each of the U's which may limit their applicability. Additionally, these analyses all build on the basic building block of a single U. So I think the approach you describe is a good start. The book "Applying Quantitative Bias Analysis to Epidemiologic Data" is a good resource for general bias analyses. However, you can find most parts of the book as individual papers (like the Lash & Fink paper).

If you related the U to the exposure and survival time, that should be sufficient for the purposes of the analysis. The key part of the bias analysis is the assumed distributions placed on relationships between U. The distributions used will determine the utility of the analysis. Unfortunately, you depend on previously published literature. If there isn't a lot of good information, the distributions you pull the parameters from should be more variable. While more general, Lash et al. 2014 may be a helpful resource.

There may be some sensitivity analysis were the U parameter is held constant, but I am unfamiliar with any approach. For a probabilistic bias analysis with distributions, you want to let that parameter vary. This is because the particular realization of u can differ between Monte Carlo runs. Allowing those parameters to vary gives you a distribution of hazard ratios under possible values for U. It 'bakes in' the uncertainty regarding the true values of U in the sample.

nyousefi2020 commented 4 years ago

Thanks for the great information. I will read more to learn the best way to perform this analysis.