CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.35k stars 557 forks source link

[need_help] Difference between rstudio 'survival' and python 'lifelines' #1613

Closed gustavoalcantara closed 4 months ago

gustavoalcantara commented 4 months ago

Hello Guys. Hope you well.

I'm conducting a survival analysis using two different libraries. One of them is RStudio's survival, and the other is Python's lifelines.

The issue is that when I perform the calculation in Python, my survival curves tend towards 0. In R, they appear normal. Here's the difference between the two graphs:

survival image

lifelines image

My Python code looks like this:

kmf = KaplanMeierFitter() 

kmf.fit(x['time'], event_observed=x['condition'], label=['staging']) 

for staging in x['staging'].unique():
    staging_data = x[x['staging'] == staging] 
    kmf.fit(staging_data['time'], event_observed=staging_data['condition'], label=staging) 
    kmf.plot()

Could you please help me? I'd like to use the Python package because of the integrations it provides with some of the infrastructure I use.

Thank you very much!

CamDavidsonPilon commented 4 months ago

Hi @gustavoalcantara, double check the data your putting into kmf.fit, and make sure that it matches what survival is getting. The data looks different in the two graphs - most noticeably that survival graph has a group with no deaths (blue line) whereas lifelines doesn't have anything near that.

gustavoalcantara commented 4 months ago

Hi @CamDavidsonPilon. Thanks for the answer. So I think the problem is my input... To determine the censored/dead event, I used 1 for alive and 0 for death... Thanks for the help. I'll fix that and send it to you again... thanks!

gustavoalcantara commented 4 months ago

@CamDavidsonPilon the problems was me and my input. lol Im gonna close this issue, ok? Many Thanks!