CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.37k stars 560 forks source link

Cross-validation doesn't work with ExponentialFitter/WeibullFitter/KaplanMeierFitter #571

Open hcarlens opened 5 years ago

hcarlens commented 5 years ago

Is this by design?

The issue appears to be that e.g. KaplanMeierFitter.fit() takes T and E as series rather than a dataframe with column references for T and E (which is how CoxPHFitter seems to work).

E.g. when I try to run: ef = ExponentialFitter() print(np.mean(k_fold_cross_validation(ef, data, duration_col='T', event_col='E'))) The error is: fit() got an unexpected keyword argument 'duration_col'

CamDavidsonPilon commented 5 years ago

This is by design, at least, I never anticipated to use those univariate fitters in prediction setting. I'm not sure if I want to support that or not - I need to consider it more.

hcarlens commented 5 years ago

Ok, interesting. Would you mind explaining the intended use of these univariate fitters, or is there a part of the docs that does that? I thought it could be useful to use them as baselines for prediction.

CamDavidsonPilon commented 5 years ago

The UnivariateFitters are more like a summary-statistic than a prediction model (ex: histogram vs. a random forest). I was thinking, you can actually still use a baseline that looks a lot like using a KaplanMeierFitter. If you use an empty DataFrame (expect for T and E) in CoxPHFitter().fit, the baseline survival curve will be almost identical to the kaplan-meier survival curve.

from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.datasets import load_rossi

rossi = load_rossi()

kmf = KaplanMeierFitter()
kmf.fit(rossi['week'], rossi['arrest'])
ax = kmf.plot()

cph = CoxPHFitter()
cph.fit(rossi[['week', 'arrest']], 'week', 'arrest') # note that I'm giving the model no additional covariates
cph.baseline_survival_.plot(ax=ax)

screen shot 2018-12-12 at 1 23 54 pm