derrynknife / SurPyval

A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
https://surpyval.readthedocs.io/en/latest/index.html
MIT License
48 stars 5 forks source link

[Joss review] Need tests for Kaplan Meier and other non-parametrics #18

Closed CamDavidsonPilon closed 3 years ago

CamDavidsonPilon commented 3 years ago

The KM is by far the most widely used model, and most important non-parametric model. I'd like to see sufficient tests for it included.

https://github.com/openjournals/joss-reviews/issues/3484

derrynknife commented 3 years ago

I've created tests for the Nelson-Aalen, Kaplan-Meier, and Fleming-Harrington, and was working on the Turnbull, and my first step was to plagiarise lifelines.

I did some looking and I can't convince myself that the NPMLE implementation in lifelines is correct. Am I missing something?

If we take the midpoint for the test example in lifelines and treat it like a non-intervally censored example we get substantialy different results:

import surpyval as surv
import numpy as np

left = [1, 8, 8, 7, 7, 17, 37, 46, 46, 45]
right = [7, 8, 10, 16, 14, np.inf, 44, np.inf, np.inf, np.inf]

x = np.vstack([left, right])

c = (~np.isfinite(x[1, :])).astype(int)
x = np.where(np.isfinite(x.mean(axis=0)), x.mean(axis=0), x[0, :])

print(surpyval.KaplanMeier.fit(x=x, c=c).R)
"""
[0.9   0.8   0.7   0.6   0.5   0.5   0.375 0.375 0.375]
"""

The lifelines test compares this to:

[0.16667016, 0.33332984, 0.125, 0.375]

And it's inverse is:

[0.83332984, 0.66667016, 0.875     , 0.625     ]

The solution that lifelines compares to doesn't align to the Kaplan-Meier solution if the midpoints are used. The lifelines npmle seems to have the correct value of the survival function as the CDF for the last value. Am I getting something wrong?

CamDavidsonPilon commented 3 years ago

Can you share the lifelines code used?

derrynknife commented 3 years ago

This is the test in lifelines:

def test_npmle():
    left, right = [1, 8, 8, 7, 7, 17, 37, 46, 46, 45], [7, 8, 10, 16, 14, np.inf, 44, np.inf, np.inf, np.inf]
    npt.assert_allclose(npmle(left, right, verbose=True)[0], np.array([0.16667016, 0.33332984, 0.125, 0.375]), rtol=1e-4)
CamDavidsonPilon commented 3 years ago

So npmle is a low-level function, and it needs to be paired with reconstruct_survival_function:

from lifelines.fitters.npmle import *
results = npmle(left, right)
sf = reconstruct_survival_function(*results)

This is equivalent to the (high-level API) KaplanMeierFitter().fit_interval_censoring(left, right)

derrynknife commented 3 years ago

Thanks, that makes more sense, and sorry to be a pest, but I've implemented surpyval differently to avoid the following scenario.

alpha = 20
beta = 15
x = np.random.weibull(beta, 100) * alpha
n, xx = np.histogram(x, bins=20)
x = np.vstack([xx[0:-1], xx[1::]])

kmf.fit_interval_censoring(x[0, :]+1e-12, x[1, :]).plot()
ax = plt.gca()
kmf.fit_interval_censoring(x[0, :], x[1, :]).plot(color='red', ax=ax)

This results in catastrophic differences in the estimates. It seems that the left edge of some intervals are considered to be failing in the previous interval. This is behaviour that I wouldn't expect and therefore comparisons between lifelines and surpyval are different.

Are you sure the Turnbull intervals in lifelines are correct?

derrynknife commented 3 years ago

Tests for Kaplan-Meier, Nelson-Aalen, and Fleming-Harrington included in f3a264a955aed0fe563dc912c07e81ceb2b12666.

Still working on Turnbull tests as per the thread above.

CamDavidsonPilon commented 3 years ago

Are you sure the Turnbull intervals in lifelines are correct?

No lol - I only know they are correct up-to the unit tests in lifelines.

The data you presented is an interesting case I'll have to explore more. I would suggest comparing results to icenreg in R if you can. (Seeing your written tests now - I suggest comparing vs R's implementations as well as lifelines)

derrynknife commented 3 years ago

Tests for Turnbull complete agains R's icenReg. Complete's nonparametric tests. See 4086c12b8a6ccc225cfab3eee4450f8a549a6510

derrynknife commented 3 years ago

No lol - I only know they are correct up-to the unit tests in lifelines.

No probs. SurPyval doesn't actually find the intervals, it just creates an interval for between every observed value and every truncated value and brute forces the solution. This implementation will result in a few additional calculations but it makes the code more readable and was easier to develop.