Result Discrepancy between lifelines CoxTimeVaryingFitter and R survival

CamDavidsonPilon / lifelines

Survival analysis in Python

MIT License

2.38k stars 560 forks source link

from lifelines import CoxTimeVaryingFitter test2 = pd.DataFrame(dict( start=[1, 2, 5, 2, 1, 7, 3, 4, 8, 8], stop =[2, 3, 6, 7, 8, 9, 9, 9,14,17], event=[1, 1, 1, 1, 1, 1, 1, 0, 0, 0], x =[1, 0, 0, 1, 0, 1, 1, 1, 0, 0 ] )) s_model = CoxTimeVaryingFitter(penalizer=0) s_model.fit(test2, event_col='event', start_col='start', stop_col='stop', formula=' ~ x', show_progress=False, robust=False) s_model.baseline_cumulative_hazard_

library(survival) test2 <- list(start=c(1, 2, 5, 2, 1, 7, 3, 4, 8, 8), stop =c(2, 3, 6, 7, 8, 9, 9, 9,14,17), event=c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0), x =c(1, 0, 0, 1, 0, 1, 1, 1, 0, 0) ) cox_pp_00 <- coxph( Surv(start, stop, event) ~ x, test2, robust = FALSE, ties = 'efron', method = 'efron') basehaz(cox_pp_00, centered=FALSE)

For what it's worth: the apparent R/Python mismatch might have the source as the R/Stata mismatch, as described here.

I mention it only because when I flipped basehaz(cox_pp_00, centered=FALSE) in your MWE to basehaz(cox_pp_00, centered=TRUE), the output was identical to centered=FALSE:

> basehaz(cox_pp_00, centered=FALSE)
     hazard time
1 0.5052761    2
2 0.8409462    3
3 1.0434840    6
4 1.2974621    7
5 1.5514402    8
6 2.0066161    9
7 2.0066161   14
8 2.0066161   17
> basehaz(cox_pp_00, centered=TRUE)
     hazard time
1 0.5052761    2
2 0.8409462    3
3 1.0434840    6
4 1.2974621    7
5 1.5514402    8
6 2.0066161    9
7 2.0066161   14
8 2.0066161   17

The identical output, to me, suggests that centered isn't touching the part of survival's behavior that gives rise to the R/Stata differences, meaning that behavior's still in play to explain the R/Python discrepency.

CamDavidsonPilon / lifelines

Result Discrepancy between lifelines CoxTimeVaryingFitter and R survival #1600

Python Version

R Version