CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.34k stars 557 forks source link

Concordance index and k-fold cross-validation, increasing the resolution of the timeline kwarg #1134

Closed jwayne2978 closed 2 years ago

jwayne2978 commented 4 years ago

I am using the CoxPHFitter and what am trying to do k-fold cross-validation. My code looks like the following.

cph_spline = CoxPHFitter(penalizer=0.1, baseline_estimation_method='spline', n_baseline_knots=5)
scores = k_fold_cross_validation(cph_spline, X, duration_col='T', event_col='E', k=10, scoring_method='concordance_index', fitter_kwargs={'step_size': 0.5, 'timeline': list(range(0, 6501, 1))})

However, I still get warnings.

C:\Continuum\anaconda3\lib\site-packages\lifelines\fitters\__init__.py:2295: ApproximationWarning: Approximating usingpredict_survival_function. To increase accuracy, try using or increasing the resolution of the timeline kwarg in .fit(..., timeline=timeline)

The API document, https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html#lifelines.fitters.coxph_fitter.CoxPHFitter.fit, does not seem to have a timeline argument. Question 1: Any ideas on how to get rid of this warning?

Additionally, I noticed that when I specify baseline_estimation_method='spline' or baseline_estimation_method='piecewise' for CoxPHFitter, I do not get a concordance value with model.print_summary(). The attribute model.concordance_index_ does not exists when I specify these baseline estimation methods. Question 2: Why do these parametric models not have a concordance index?

Question 3: Lastly, where in the situation we do get a concordance index value (e.g. baseline_estimation_method='breslow', what actually is being used to compute the ranking? Is it the median survival time?

MorzHT commented 2 years ago

I was wondering if you have managed to solve this problem. If yes, then any suggestion would be deeply appreciated

CamDavidsonPilon commented 2 years ago

So I looked into this: it's not attached to the model because prediction is too slow. That is, we need to compute a survival function for each subject, then compute a median, and for large data sizes, this is too slow to be satisfying.

You can still compute the concordance index yourself - see the code snippet in this section: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#concordance-index

CamDavidsonPilon commented 2 years ago

w.r.t the warning: not much you can do - it's a result of using a semi-parametric model!