CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.32k stars 551 forks source link

Modeling late entries in CPH #1178

Open ah-sadek opened 3 years ago

ah-sadek commented 3 years ago

When modeling late entries in a kmf, the survival function would be more 'pessimistic' about customer survival, which makes sense.

kmf.fit(data["Duration"], event_observed=data["Observed"], entry=data["W"], label='modeling late entries') image

However, when modeling late entries in a cph, we see the opposite effect, being, getting a better survival function than when ignoring late entries.

cph.fit(data, 'Duration', 'Observed',entry_col='W') image

You can also refer to my repo here: https://github.com/ah-sadek/CustomerAnalytics

ah-sadek commented 3 years ago

It is worthy to note that you mentioned that there is a bug when calculating the survival function in the Cox model with late entries. Specifically, while the coefficient estimate calculations handle the late entries, the calculation of the survival function does not.

CamDavidsonPilon commented 3 years ago

Thanks @ah-sadek!