Open hcarlens opened 5 years ago
This is by design, at least, I never anticipated to use those univariate fitters in prediction setting. I'm not sure if I want to support that or not - I need to consider it more.
Ok, interesting. Would you mind explaining the intended use of these univariate fitters, or is there a part of the docs that does that? I thought it could be useful to use them as baselines for prediction.
The UnivariateFitters are more like a summary-statistic than a prediction model (ex: histogram vs. a random forest). I was thinking, you can actually still use a baseline that looks a lot like using a KaplanMeierFitter. If you use an empty DataFrame (expect for T
and E
) in CoxPHFitter().fit
, the baseline survival curve will be almost identical to the kaplan-meier survival curve.
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.datasets import load_rossi
rossi = load_rossi()
kmf = KaplanMeierFitter()
kmf.fit(rossi['week'], rossi['arrest'])
ax = kmf.plot()
cph = CoxPHFitter()
cph.fit(rossi[['week', 'arrest']], 'week', 'arrest') # note that I'm giving the model no additional covariates
cph.baseline_survival_.plot(ax=ax)
Is this by design?
The issue appears to be that e.g. KaplanMeierFitter.fit() takes T and E as series rather than a dataframe with column references for T and E (which is how CoxPHFitter seems to work).
E.g. when I try to run:
ef = ExponentialFitter()
print(np.mean(k_fold_cross_validation(ef, data, duration_col='T', event_col='E')))
The error is:fit() got an unexpected keyword argument 'duration_col'