CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.35k stars 557 forks source link

include median confidence interval in estimates? #277

Open louridas opened 7 years ago

louridas commented 7 years ago

Hello,

Would it make sense to include median confidence intervals in the estimates? For example, in the object returned from KaplanMeierFitter(), after fitting, one could do:

# get the confidence interval of the median
# start by finding the median on the timeline
median_ci_idx = kmf.confidence_interval_.index.get_loc(median_lifespan,
                                                       method='nearest')
# then get the confidence interval at the point of the median
median_ci = kmf.confidence_interval_.iloc[median_ci_idx]
# to find the confidence interval for the value of the median
# we will get a horizontal line passing through the value of the
# median; note that the greater value comes earlier, and the smaller later
lcl = (kmf.survival_function_['KM_estimate']
       <= median_ci.loc['KM_estimate_upper_0.95']).idxmax(0)    
ucl = (kmf.survival_function_['KM_estimate']
       <= median_ci.loc['KM_estimate_lower_0.95']).idxmax(0)

This is what is essentially done by R, as explained in print.survfit.

CamDavidsonPilon commented 7 years ago

There is a util function in lifelines to make this easy:

from lifelines.utils import median_survival_times
print median_survival_times(kmf.confidence_interval_)
# dataframe
"""
                          0.5
KM_estimate_upper_0.95  530.0
KM_estimate_lower_0.95  468.0
"""

What's I think more interesting is add a __repr__ function to a fitted KaplanMeierFitter that this could be expressed in.

louridas commented 7 years ago

Thanks!

In fact this is a different interpretation of the median confidence interval than the one used in print.survfit. In lifelines we get the the median of the lower and the upper confidence interval functions; in print.survfit we get the median of the KM function and then find the intersection of the horizontal like with the two confidence interval functions.

I am not a statistician so I do not know which of the two most people would mean when they talk about confidence intervals of the median.

robertcv commented 3 years ago

median_survival_times doesn't work like that anymore in the current release. Was this functionality moved somewhere else?

CamDavidsonPilon commented 3 years ago

@robertcv what's not working? Can you provide an example or code snippet?

robertcv commented 3 years ago

In the previous comment you point to median_survival_times as a method of getting the 95% confidence interval for the median survival time. In the current version of the package, this function only returns the median survival time and does not include the interval. I fixed this by implementing the confidence interval myself. It worked similar to what OP described.

CamDavidsonPilon commented 3 years ago

I'm not sure I'm seeing the same thing as you:

from lifelines.utils import median_survival_times
kmf = KaplanMeierFitter().fit(np.random.exponential(2, size=50000))
print(median_survival_times(kmf.confidence_interval_))

returns two values, 1.383272 & 1.418151

robertcv commented 3 years ago

I am sorry, now I see why it didn't work for me. The function behaves differently depending on what you have on the input. I used median_survival_times(kmf) instead of median_survival_times(kmf.confidence_interval_) and only got the median time. I would suggest updating the documentation on this function. The parameters description uses "or" and I thought that it makes no difference if I use the model class or the confidence interval DataFrame.