Open CamDavidsonPilon opened 4 years ago
I checked. Proportional_hazard_test results (test statistic and p value) are same irrespective of which transform I use. Perhaps there is some accidentally hard coding of this in the backend?
hm, that behaviour sounds strange, but must be data specific. I've been looking into this function recently, and have seen difference between transforms. I'll investigate further however.
I can upload my codes if needed. Let me know
That would be appreciated! It would be nice to understand the behaviour more.
Here you go (Link to the R results I attempted to mimic: http://www.sthda.com/english/wiki/cox-model-assumptions)
# Importing packages
import lifelines
import pandas as pd
from lifelines import datasets
# Importing dataset
lung_dataset = datasets.load_lung()
# Cox PH
lung_coxPH = lifelines.CoxPHFitter().fit(duration_col = 'time', event_col = 'status', df = lung_dataset.drop(['inst', 'ph.ecog', 'ph.karno', 'pat.karno', 'meal.cal'],axis = 1).dropna())
# Proportional Hazard Testing
# KM transform
lifelines.statistics.proportional_hazard_test(lung_coxPH, lung_dataset.drop(['inst', 'ph.ecog', 'ph.karno', 'pat.karno', 'meal.cal'], axis = 1).dropna(), transform = 'km').print_summary()
# Rank transform
lifelines.statistics.proportional_hazard_test(lung_coxPH, lung_dataset.drop(['inst', 'ph.ecog', 'ph.karno', 'pat.karno', 'meal.cal'], axis = 1).dropna(), transform = 'rank').print_summary()
# Result obtained in both case
# time_transform | rank
# null_distribution | chi squared
# degrees_of_freedom | 1
# transform | km
# test_name | proportional_hazard_test
# test_statistic p
# age 2.35 0.13
# sex 4.86 0.03
# wt.loss 0.76 0.38
The events col in lung_dataset is "1" for censored and "2" for dead. With your code, all the events would be True. Please include below line in your code:
lung_dataset["status"] = lung_dataset["status"] - 1
# Results obtained with the above update
# test_statistic p
# age 0.67 0.41
# sex 2.17 0.14
# wt.loss 0.03 0.86
Still not exactly the same as the results from R.
@taoxu2016 is correct, and another change needs to be made:
transform
should read time_transform
:
lifelines.statistics.proportional_hazard_test(lung_coxPH, lung_dataset.drop(['inst', 'ph.ecog', 'ph.karno', 'pat.karno', 'meal.cal'], axis = 1).dropna(), time_transform = 'km').print_summary()
So I dug deeply into this problem.
In version 3.0 of survival
, released 2019-11-06, a new, more accurate version of the cox.zph
was introduced. This avoided an assumption of variance matrices do not varying much over time.
This also explains why when I wrote this function for lifelines (late 2018), all my tests that compared lifelines with R were working fine, but now are giving me trouble.
I have no plans at this time to update this function to use the more accurate version. My attitudes towards the PH assumption have changed in the meantime.
A follow-up on this: I was cross-referencing R's **old** cox.zph
calculations (< survival
3, before the routine was updated in 2019) with check_assumptions()
's output, using the rossi
example from lifelines
' documentation...
...and I'm finding the output doesn't match. I used Stata (which still uses the PH test approximation) to verify that nothing odd was occurring with survival::cox.zph
's calculations.
// R (trimmed)
Call:
coxph(formula = Surv(week, arrest) ~ age + fin + mar + paro,
data = py$rossi, ties = "breslow")
n= 432, number of events= 114
coef exp(coef) se(coef) z Pr(>|z|)
age -0.06656 0.93560 0.02104 -3.163 0.00156 **
fin -0.34626 0.70733 0.19017 -1.821 0.06863 .
mar -0.49298 0.61080 0.37431 -1.317 0.18783
paro -0.17357 0.84065 0.19253 -0.902 0.36729
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
// Python (trimmed)
coef exp(coef) se(coef) coef lower 95% coef upper 95%
covariate
age -0.067 0.935 0.021 -0.108 -0.025
fin -0.346 0.707 0.190 -0.719 0.026
mar -0.494 0.610 0.374 -1.227 0.240
paro -0.174 0.840 0.193 -0.551 0.204
// Stata
Cox regression with Breslow method for ties
No. of subjects = 432 Number of obs = 432
No. of failures = 114
Time at risk = 19,809
LR chi2(4) = 21.24
Log likelihood = -665.06224 Prob > chi2 = 0.0003
------------------------------------------------------------------------------
_t | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0665648 .0210428 -3.16 0.002 -.1078079 -.0253217
fin | -.3462649 .1901694 -1.82 0.069 -.7189901 .0264602
mar | -.4929785 .3743093 -1.32 0.188 -1.226611 .2406542
paro | -.1735749 .1925273 -0.90 0.367 -.5509215 .2037717
------------------------------------------------------------------------------
// R
# A tibble: 8 x 4
covar tFF chisq p
<chr> <chr> <dbl> <dbl>
1 age km 7.07 0.00782
2 age rank 6.91 0.00857
3 fin km 0.00000363 0.998
4 fin rank 0.000932 0.976
5 mar km 2.20 0.138
6 mar rank 2.14 0.143
7 paro km 0.0473 0.828
8 paro rank 0.0355 0.851
// Python
test_statistic p -log2(p)
age km 6.79 0.01 6.77
rank 7.21 0.01 7.11
fin km 0.00 0.95 0.07
rank 0.00 0.96 0.06
mar km 2.09 0.15 2.75
rank 2.28 0.13 2.93
paro km 0.03 0.87 0.20
rank 0.04 0.85 0.24
// Stata
Test of proportional-hazards assumption
Time function: 1 - Kaplan-Meier estimate
--------------------------------------------------------
| rho chi2 df Prob>chi2
-------------+------------------------------------------
age | -0.21256 7.07 1 0.0078
fin | -0.00018 0.00 1 0.9985
mar | 0.13497 2.20 1 0.1377
paro | -0.02002 0.05 1 0.8279
-------------+------------------------------------------
Global test | 8.12 4 0.0874
--------------------------------------------------------
Test of proportional-hazards assumption
Time function: Rank of analysis time
--------------------------------------------------------
| rho chi2 df Prob>chi2
-------------+------------------------------------------
age | -0.21008 6.91 1 0.0086
fin | 0.00283 0.00 1 0.9756
mar | 0.13304 2.14 1 0.1434
paro | -0.01735 0.04 1 0.8505
-------------+------------------------------------------
Global test | 7.92 4 0.0946
--------------------------------------------------------
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
cph = CoxPHFitter()
# Simplified model spec from readme ex (fewer covars, no strata var)
cph.fit(rossi, 'week', event_col='arrest', formula="age + fin + mar + paro")
cph.print_summary(model="untransformed variables", decimals=3)
cph.check_assumptions(rossi, advice=False, p_value_threshold=0.05)
## IMPORTANT: installed survival version must be <= 2.44-1
library(dplyr)
library(tibble)
library(reticulate)
#*******************************************************************************
# MUST be survival 2.44-1 or earlier for the approximation of T&G's score test
tempDir <- "C:/Users/Public/Documents/" # temporary directory for survival 2.44.1 install
remotes::install_version("survival", version = "2.44-1", lib = tempDir)
library(survival, lib.loc = tempDir)
# Load data via Python
repl_python()
from lifelines.datasets import load_rossi
rossi = load_rossi()
exit
# Estm model ----
mod <- coxph(Surv(week, arrest) ~ age + fin + mar + paro,
data=py$rossi, ties="breslow")
summary(mod) # this spec's output now matches Python
# PH test ----
tFFs <- c("km", "rank")
## Approx ====
zph.appr <- lapply(tFFs,
function(x){
cox.zph(mod, transform=x)$table %>%
data.frame %>%
rownames_to_column(var = "covar") %>%
mutate(tFF = x,
dispOrd = row_number(),
tFFOrd = which(tFFs==x))
})
## (Output in a way that mirrors Python, for sanity's sake)
zph.appr <- do.call(bind_rows, zph.appr) %>%
arrange(dispOrd, tFFOrd) %>%
select(covar, tFF, chisq, p) %>%
filter(covar!="GLOBAL") %>%
tibble
## Print ====
zph.appr
python
from sfi import *
from lifelines.datasets import load_rossi
rossi = load_rossi()
# Throw back into Stata
Data.setObsTotal(len(rossi))
varlist = ["week", "arrest", "age", "fin", "mar", "paro"]
for x in varlist:
Data.addVarInt(x)
Data.store(x, None, rossi[x], None)
end
// Estm model
stset week, fail(arrest)
stcox age fin mar paro, nohr breslow
// PH test
estat phtest, detail km
estat phtest, detail rank
Hi @MetzgerSK - thanks for the (very) detailed report. I'll look into this soon. Apologies that this is occurring. I'm relieved that a previous-me did write tests for this function, but that was on a different dataset. I'll review why rossi
dataset is different, building off what you've shown here.
I haven't yet dug into this, but my suspicion is that the results are due to how ties are handled. rossi
has lots of ties, whereas the testing dataset I used has none.
Possibly. I did quickly check the (unscaled) Schoenfelds out of lifelines
' compute_residuals()
and survival 2.44-1
's resid()
for the rossi
data, using the models from my original MWE.
rossi
(ties): The Schoenfeld values are close for a given t-covariate profile pairing, but R and Python don't match. rossi
(no ties + only one censored data point at t = 52, the largest duration in the toy data): R's and Python's Schoenfelds match.Not definitive, but it's suggestive.
rossi
from lifelines.datasets import load_rossi
rossi = load_rossi()
rossi_dup = rossi.sort_values(by=['week', 'age', 'race', 'wexp', 'mar', 'paro', 'prio'])
# ^ quick attempt to get unique sort order
rossi_dup.drop_duplicates(subset = "week", keep = 'first', inplace = True)
Possible solution: https://github.com/CamDavidsonPilon/lifelines/issues/997#issuecomment-652567848