CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.38k stars 560 forks source link

How to calculate a stratified logrank test with lifelines ? #1234

Open MatGodRho opened 3 years ago

CamDavidsonPilon commented 3 years ago

Hm, I don't think you can, sorry @MatGodRho. FYI, there is a trend to move away from log-rank tests though, see comments here: https://discourse.datamethods.org/t/when-is-log-rank-preferred-over-univariable-cox-regression/2344

dbikiel commented 3 years ago

What if you build a univariate CoxPH with the needed stratification as strata? The p-value for the variable should be similar to the stratified logrank test. Am I wrong?

CamDavidsonPilon commented 3 years ago

This is what is recommended ^ The p-values might be different (they are measuring different things), but in both cases you are testing a detectable difference in hazards.

MatGodRho commented 3 years ago

Thanks guys! I am trying to reproduce a results from a colleague. He is working in R.

How about getting the logrank test statistic for each stratum first and then taking the mean of these test statistics? I obtain something very close to the results of my colleague with this technique if I exclude any test statistic of 0 in the mean. But the result is not exactly the same.

I tested the univariate CoxPH with the strata option but I did not obtain the same p-value as my colleague.

dbikiel commented 3 years ago

In Kleinbaum and Klein (Survival Analysis, 2012, pag 77), they say:

"The stratified log rank test is another variation of the log rank test. With this test the summed observed minus expected scores O - E are calculated within strata of each group and then summed across strata."

You may have to tweak Cam logrank test function to do exactly this. Check the book, has very well explained examples.

In the other hand and just to check my previous comment. For the univariate coxPH approach you need two variables (in addition of duration and event), something like this (not tested...)

df = df[[ 'duration', 'event', 'variable', 'subset']] cph = CoxPHFitter(strata='subset') cph.fit(df, 'duration', 'event', formula = 'variable') strat_pvalue = cph.summary['p'].values[0]

orah1998 commented 2 years ago

Hi, sorry for re openining this discussing, i saw that you guys spoke about the p_value and the test_statistic results of logRank test, im struggling to undestand what those mean, and ill be glad for a small explanation!!

CamDavidsonPilon commented 2 years ago

Hi @orah1998 , I could suggest using a forum like crossvalidated.com to ask / search questions about what statistical values mean. These concepts aren't unique to lifelines either, and are used throughout statistics.