CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.38k stars 560 forks source link

Hazard rates (dividing by timedelta, censored times) #1592

Open shilet opened 10 months ago

shilet commented 10 months ago

When I use the NelsonAalen fitter to determine the hazard rate I came across the following two issues:

  1. in calculating the instantaneous hazard, the time between two events is not used to determine the hazard. Normally h0 = d_j/(n_j * T_j) where d_j are the number of deaths, n_j ,the number at risk, and T_j the time between tj and t(j+1)
  2. the hazard rate is also determined at censored times. The hazard is then zero. To my knowledge, hazard rates are not determined at censored times.

For a simple example see: data = { 'duration': [2, 4, 6, 7, 8], 'event': [1, 1, 0, 1, 1], }

df = pd.DataFrame(data) naf = NelsonAalenFitter() naf.fit(df['duration'], df['event']) naf.plot_hazard(bandwidth=1, ci_show=False, label='NA hazard lifelines')

The same problem in the CoxPH fitter. For determining the baselinehazard dividing by delta time is not done, and the hazard is also determined at censored times.

I was also wondering why the calculation of the hazard rate based on the Kaplan Meier event table is not implemented as this seems to me the most straightforward implementation.