CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.32k stars 551 forks source link

Incompatibility with pandas 2.X computing restricted mean over survival function #1578

Closed Batalex closed 6 months ago

Batalex commented 7 months ago

Hi, Thank you kindly for your work on lifelines. Here is my issue:

Env

Package Version
lifelines 0.27.8
pandas 2.1.3

Current behavior

from lifelines import KaplanMeierFitter
from lifelines.utils import restricted_mean_survival_time

T = [1, 2, 3, 4, 10]
kmf = KaplanMeierFitter().fit(T)
restricted_mean_survival_time(kmf.survival_function_, t=10, return_variance=True)
Traceback (most recent call last):
  File "lifelines/repro_bug.py", line 5, in <module>
    restricted_mean_survival_time(kmf.survival_function_, t=10, return_variance=True)
  File "lifelines/utils/__init__.py", line 252, in restricted_mean_survival_time
    sq = _expected_value_of_survival_squared_up_to_t(model_or_survival_function, t)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lifelines/utils/__init__.py", line 314, in _expected_value_of_survival_squared_up_to_t
    sf = sf.append(pd.DataFrame([1], index=[0], columns=sf.columns)).sort_index()
         ^^^^^^^^^
  File "venv/Lib/site-packages/pandas/core/generic.py", line 6204, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?

I know that we should use restricted_mean_survival_time(kmf, ...) but sometimes we need to evaluate the RS over a survival function we got from somewhere else than a lifelines model.

Possible solution

   # lifelines/utils/__init__.py - L312

    if isinstance(model_or_survival_function, pd.DataFrame):
        sf = model_or_survival_function.loc[:t]
-        sf = sf.append(pd.DataFrame([1], index=[0], columns=sf.columns)).sort_index()
+        sf = pd.concat((sf, pd.DataFrame([1], index=[0], columns=sf.columns))).sort_index()
        sf_tau = sf * sf.index.values[:, None]
        return 2 * trapz(y=sf_tau.values[:, 0], x=sf_tau.index)
    elif isinstance(model_or_survival_function, lifelines.fitters.UnivariateFitter):
        # lifelines model

This way, we also keep the compatibility with pandas 1.X. I can make a PR if you would like.

CamDavidsonPilon commented 6 months ago

This is great! Thanks for the fix suggestion - I'll implement it in the upcoming release.