CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.34k stars 554 forks source link

baseline_survival in CoxTimeVaryingFitter #1145

Open Valery2511 opened 3 years ago

Valery2511 commented 3 years ago

Hello! I am translating the Cox model from STATA to Python and found that the baseline survival in STATA differs from the baseline survival in Python (I started with a simple Cox model without regressors, in STATA is "stcox, estimate"). I calculated the baseline survival manually - this coincided with what the STATA produces (excel-file "BS_test", sheet "Est_baseline", column «I»). BS_test.xlsx

Code in Python:

###### Data download:
df_model = pd.read_excel('BS_test.xlsx', sheet_name = 'Data')
df_model = df_model[[‘id’, ‘event’, ‘months’]]

###### Data preparation:
df_model = to_long_format(df_model, duration_col = "months")
df_model['start'] = df_model['stop']
df_model['stop'] = df_model['start']+1

###### Run the model:
ctv = CoxTimeVaryingFitter()
ctv.fit(df_model, id_col ='id', event_col='event', start_col = 'start', stop_col = 'stop')
ctv.print_summary()

###### Calculate baseline survival:
ctv.baseline_survival_

baseline_survival

Then I tried to calculate the baseline survival through the baseline hazard and the results again coincided with the STATA/ manually (excel).

Code in Python:

###### Calculate baseline survival from hazard_fuction:
result = ctv.baseline_cumulative_hazard_

result['shif'] = result['baseline hazard'].shift(1)
result['shif'] = result['shif'].fillna(0)
result['baseline_hazard_no_cum'] = 1 - (result['baseline hazard'] - result['shif'])
result['tec'] =1
result['baseline survival_from_hazard'] = result.groupby('tec')['baseline_hazard_no_cum'].cumprod()
del result['tec']

result.head()

base_line_from_hazard

Can you please tell me why the baseline survival in Python (“ctv.baselinesurvival”) may differ from the baseline survival calculated manually (excel) / in STATA?

Valery2511 commented 3 years ago

Just in case, I also attach the STATA code:

stcox, estimate
predict baseline_survival, basesurv
stefan-de commented 3 years ago

Did you had a look at the parametrization of CTVM in STATA/R compared to that of lifelines? I suspect that there you could find the reason why ...