Open JayGuAtGitHub opened 5 years ago
Hey there, this isn’t on the short term roadmap. I’m curious for your use cases for wanting other tie methods, though. From what I’ve read, Efron is the fast and gives a much better approximation than Breslow. In fact, I’ve seen commentary that complains that SPSS default is breslow and not Efron.
Same issue with SAS is that it defaults to Breslow, rather than Efron. R's "exact" method is a discrete time logistic model, not the exact probablities (SAS allows for discrete and exact). It results in odds ratios, not hazard ratios.
Discrete as an option may be a nice addition to lifelines. It would allow for discrete time data with the CPH model.
Many thx for your and sorry for giving the reply so late.
Currently we decide to use [lifelines] and [Efron]. We check some paper and there are indeed some which said [Efron] is better.
We also test it in R and found the results are very different. We are wondering if [lifelines] does it in [multiple variable] way?
I’m curious for your use cases for wanting other tie methods, though.
We are working with doctors and some statistics guys to try to build an auto data analysis system. Cox is an important node inside. In the past we do the manul work in SPSS and R. Both return the same result. So I was thinking about if [lifelines] can do the same thing.
Same issue with SAS is that it defaults to Breslow, rather than Efron. R's "exact" method is a discrete time logistic model, not the exact probablities (SAS allows for discrete and exact). It results in odds ratios, not hazard ratios.
It means that I can't reproduce the same result in R?
R (using Efron's tie method, which is default) and lifelines should be the same, and if they are not, I would be very curious! If you see differences, please post the code you are using.
So, SAS's Efron
, R's Efron
, and lifelines's Efron
should all produce the same results.
SAS's Breslow
and R's Breslow
should produce the same results.
SAS's Discrete
and R's Exact
should produce the same results. Note that this method is for discrete time (not continuous time, like the other methods). Additionally, it produces odds ratios, not hazard ratios. This is a result of the partial likelihood function. This method should only be used if you are really in a discrete setting
SAS's Exact
is not available in any other software that I am aware of. Rather than using the Breslow or Efron approximations, it calculates the exact probability. It takes a lot of calculations and is time-consuming. Not much is gained by using Exact over Efron. It really is only feasible for a small amount of ties
I think we now already detected the problem, but still not sure if we use lifelines in the correct way.
We use load_rossi as our test data.
When I set like
cph.fit(data, duration_col="week", event_col="arrest", show_progress=True)
data is just the load_rossi ()
it returns like
coef | exp(coef) | se(coef) | z | p | lower 0.95 | upper 0.95 | |
---|---|---|---|---|---|---|---|
fin | -0.37942 | 0.684257 | 0.191379 | -1.98256 | 0.047416 | -0.75452 | -0.00433 |
age | -0.05744 | 0.944181 | 0.021999 | -2.61093 | 0.00903 | -0.10055 | -0.01432 |
race | 0.3139 | 1.368753 | 0.307992 | 1.01918 | 0.308117 | -0.28975 | 0.917554 |
wexp | -0.1498 | 0.860883 | 0.212225 | -0.70584 | 0.48029 | -0.56575 | 0.266157 |
mar | -0.4337 | 0.648105 | 0.381861 | -1.13576 | 0.256057 | -1.18214 | 0.314732 |
paro | -0.08487 | 0.918631 | 0.195757 | -0.43355 | 0.664614 | -0.46855 | 0.298805 |
prio | 0.091498 | 1.095814 | 0.028648 | 3.193868 | 0.001404 | 0.035349 | 0.147646 |
when I use R, I use like this
y<-Surv(time=coxdata$week,event=coxdata$arrest) a<-coxph(y~age,data=coxdata,ties="efron")
it returns like
coef | exp(coef) | se(coef) | z | p | lower 0.95 | upper 0.95 | |
---|---|---|---|---|---|---|---|
age | -0.07284 | 0.929745 | 0.02079 | -3.50392 | 0.000458 | -0.11359 | -0.0321 |
Then we try to remove all columns except, age, week, arrest and run with lifelines again, it returns the same result!
That's why I just asked about
We are wondering if [lifelines] does it in [multiple variable] way?
Seems no big issue so far. Thanks a lot for your work and this package.
One small question, how the [multiple variable] really works?
Do we have plans to add more supporting to this [tie_method] in CoxPHFitter?
In fact I try to do an analysis for the data, but the result is different with what I didin the SPSS. I also tried in R, and found that in R, if I set "breslow" in something like "ties=c("efron","breslow","exact"), ", it will return the same result as SPSS.
So I suppose the [tie_method] will provide the same ability. Am I right? And if we have any plan to implement that?
Many thx!