CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.32k stars 551 forks source link

Handling of observations with birth_time==death_time #1604

Open user799595 opened 2 months ago

user799595 commented 2 months ago
kmf = lifelines.KaplanMeierFitter()
kmf.fit([1, 2], event_observed=[1, 0], entry=[1, 0])
print(kmf.survival_function_)

Expected:

          KM_estimate
timeline             
0.0               1.0
1.0               0.5
2.0               0.5

Actual:

          KM_estimate
timeline             
0.0               1.0
1.0               0.0
2.0               0.0

I've read https://github.com/CamDavidsonPilon/lifelines/issues/497 and the corresponding comments

        # Why subtract entrants like this? see https://github.com/CamDavidsonPilon/lifelines/issues/497
        # specifically, we kill people, compute the ratio, and then "add" the entrants.
        # This can cause a problem if there are late entrants that enter but population=0, as
        # then we have log(0 - 0). We later ffill to fix this.
        # The only exception to this rule is the first period, where entrants happen _prior_ to deaths.

But I can't wrap my head around what this is saying. How could entrants not happen prior to deaths? If I have an observation with birth_time==death_time does that mean that it died before it was born?

I thought that the likelihood is

CamDavidsonPilon commented 5 days ago

This is an interesting issue, and I want to agree with your expected case. However, I'm also inclined to reject the case birth_time==death_time as pathological to lifelines. Based on that highlighted comment, it sounds like birth_times is actually birth_time + \epsilon. So if you want a true birth_time==death_time, you would add an epsilon to the death time:

kmf = lifelines.KaplanMeierFitter()
kmf.fit([1+1e-10, 2], event_observed=[1, 0], entry=[1, 0])
print(kmf.survival_function_)
          KM_estimate
timeline
0.0               1.0
1.0               1.0
1.0               0.5
2.0               0.5

This is terrible and not at all how I expect users to fix this. I'll have to think more about this.