CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.37k stars 560 forks source link

Failure function plotting #250

Closed bennoleslie closed 7 years ago

bennoleslie commented 8 years ago

I'd like to be able to plot the F(t) rather than S(t) as it is a more informative way at looking at the data in my use case.

It's not clear what the best way of doing this would be with the current API. I ended up modifying the

--- a/lifelines/fitters/kaplan_meier_fitter.py
+++ b/lifelines/fitters/kaplan_meier_fitter.py
@@ -20,7 +20,7 @@ class KaplanMeierFitter(UnivariateFitter):
     """

     def fit(self, durations, event_observed=None, timeline=None, entry=None, label='KM_estimate',
-            alpha=None, left_censorship=False, ci_labels=None):
+            alpha=None, left_censorship=False, ci_labels=None, survival=True):
         """
         Parameters:
           duration: an array, or pd.Series, of length n -- duration subject was observed for
@@ -63,8 +63,10 @@ class KaplanMeierFitter(UnivariateFitter):
                 raise StatError("""There are too few early truncation times and too many events. S(t)==0 for all t>%.1f. Recommend BreslowFlemingHarringtonFitter.""" % ix)

         # estimation
-        setattr(self, estimate_name, pd.DataFrame(np.exp(log_survival_function), columns=[self._label]))
-        self.__estimate = getattr(self, estimate_name)
+        self.__estimate = pd.DataFrame(np.exp(log_survival_function), columns=[self._label])
+        if not survival:
+            self.__estimate = 1 - self.__estimate
+        setattr(self, estimate_name, self.__estimate)
         self.confidence_interval_ = self._bounds(cumulative_sq_[:, None], alpha, ci_labels)
         self.median_ = median_survival_times(self.__estimate)

Is there a more straight forward way of achieving this?

CamDavidsonPilon commented 8 years ago

The most basic way is to not modify anything and do:

kmf.fit(T, E)
(1-kmf.survival_function_).plot()

If you wanted to use the internal lifelines plotting library, then I'm afraid it's a bit more complicated.

bennoleslie commented 8 years ago

That is neat, I hadn't realised that this object was directly plot-able like that.

Unfortunately though I'm now quite attached to all the niceties of the lifelines plotting library, so I'm not sure I can see a good alternative that wouldn't.

The next best that I came up with is this:

        class Failure:
            def __init__(self, kmf):
                self.failure_function_ = 1 - kmf.survival_function_
                self.confidence_interval_ = 1 - kmf.confidence_interval_
                self.plot = plotting.plot_estimate(self, 'failure_function_')

        Failure(kmf).plot()

Which seems like it works, but I'm not 100% sure about the 1 - kmf.confidence_interval_ part.

Is there any interest in having the KaplanMeierFitter class directly fit the failure function rather than the survival function, or is that simply not a useful thing in general?

CamDavidsonPilon commented 7 years ago

I think it's a small jump for the user to create this. However adding it to KaplanMeierFitter would complicate the API further, so I am voting to close this. Thanks for the issue however!