CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.37k stars 560 forks source link

Using weighted data in proportional_hazard_test() for CoxPH #996

Open aisling-om opened 4 years ago

aisling-om commented 4 years ago

I've been comparing CoxPH results for R's Survival and Lifelines, and I've noticed huge differences for the output of the test for proportionality when I use weights instead of repeated rows. The hazard ratio estimate and CI's are very close, but the proportionality chisq is very different.

I've attached a csv (txt because Github) with sample data.

The R (Survival) code is:

> hr_data <- lapply(hr, as.numeric) #only need this when using weights
> res.cox <- coxph(Surv(daysToEvent, hasOutcome) ~ cohort, data = hr_data, weights = count, robust = FALSE)
> x <- cox.zph(res.cox, transform = "km")
> summary(res.cox)
> print(x)

The Lifelines code is:

cph = CoxPHFitter()

cph.fit(
        df=hr_df, # hr data as a dataframe
        duration_col="daysToEvent",
        event_col="hasOutcome",
        weights_col="count",
        show_progress=False,
        robust=False,
    )

hazard_ratio = cph.hazard_ratios_.at["cohort"]
 lower_ci = math.exp(cph.confidence_intervals_.values[0][0])
 upper_ci = math.exp(cph.confidence_intervals_.values[0][1])
proportional_test = proportional_hazard_test(cph, hr_df, time_transform="km")

For the attached data, using weights, I get from Lifelines:

"hazardRatio": 0.5070670424582165,
        "lower_ci": 0.2374970008403611,
        "upper_ci": 1.0826115051454892
"chisq": 6.030892236208823,
        "dof": 1.0,
        "p": 0.014057625681369732

from R:

       exp(coef) exp(-coef) lower .95 upper .95
cohort    0.5071      1.972    0.2375     1.083

       chisq df       p
cohort  30.3  1 3.7e-08

Whereas using a row per entry and no weights, I get Lifelines:

"hazardRatio": 0.48501838000070696,
        "lower_ci": 0.22532011968933832,
        "upper_ci": 1.0440382743576246
"chisq": 29.78133316357846,
        "dof": 1.0,
        "p": 4.836262532790787e-08

R:

       exp(coef) exp(-coef) lower .95 upper .95
cohort     0.485      2.062    0.2253     1.044

              chisq df       p
cohort  31.7  1 1.8e-08

So the hazard ratio values and errors are in good agreement, but the chi-square for proportionality is way off when using weights in Lifelines (6 vs 30). thanks. hr.txt

aisling-om commented 4 years ago

The numbers given above are from 22.4, but 24.4 only changes things very slightly.

CamDavidsonPilon commented 4 years ago

Thanks for the detailed issue @aongus, I'll look into this asap

aisling-om commented 4 years ago

hi @CamDavidsonPilon have you had any chance to look into this?

CamDavidsonPilon commented 4 years ago

ack sorry, it's a high priority but am stuck on it. I haven't made much progress, unfortunately.

aisling-om commented 4 years ago

thanks for giving it a shot!

CamDavidsonPilon commented 4 years ago

Hi @aongus, I've dug a bit into this recently, and the problem may be due to R changing their algorithm recently for computing these values, see https://github.com/CamDavidsonPilon/lifelines/issues/997#issuecomment-652567848

aisling-om commented 4 years ago

Hi @CamDavidsonPilon , thanks for figuring this out. I can see how these numbers will be different from different regressors/implementations.

I guess tho from my perspective the more immediate issue was that using weighted vs unweighted data produced totally different results.