CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.34k stars 553 forks source link

Suggestion: Multiple Kaplan Meier confidence interval options #424

Open javadnoorb opened 6 years ago

javadnoorb commented 6 years ago

Greenwood CI for KM estimator does not work very well with highly censored data and also has asymptotic properties, so doesn't work well for small sample sizes. After playing around with lifelines and getting many NaNs for CIs of highly censored data, I ended up using "Beta Product Confidence Procedure for Right Censored Data" from the following paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3769999/

There is an R package to calculate this (which also includes four other CIs): https://cran.r-project.org/web/packages/bpcp/index.html

I'm a Python user so I wrapped this in rpy2. I understand that introduction of new libraries, wrappers or statistical methods may not be appealing, but just checking if there is any interest to incorporate this or any other CIs into lifelines?

Here's a sample of code that can create these CIs:

from rpy2.robjects import pandas2ri,r
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.rinterface import RRuntimeError

utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1);
packnames = ['bpcp']
names_to_install = [x for x in packnames if not rpackages.isinstalled(x)]
if len(names_to_install) > 0:
    utils.install_packages(StrVector(names_to_install))

pandas2ri.activate()

bpcp = rpackages.importr('bpcp')

def bpcp_survival(df,days_to_succeed = 5*365.): 
    try:
        bfit = bpcp.bpcp(df['TherapyToLastDay'],df['Vital Status'],Delta=0);
    except RRuntimeError:
        return pd.Series(index=['survival','lower','upper'])
    surv_results = pandas2ri.ri2py(bpcp.StCI(bfit,days_to_succeed))
    return surv_results[['survival','lower','upper']].iloc[0]
tueda commented 8 months ago

For some comparative reason, I needed the "traditional" Greenwood's formula (which made me create a monkey patch for the _bounds method with the current code base). It would be nice if the user could choose optional methods for the confidence intervals.