Open Valery2511 opened 4 years ago
Hi @Valery2511 At first thought, I'm not sure. I do have tests in the project comparing lifelines baseline vs R's baseline. I will have to do some digging.
Are you comfortable providing me the dataset you are using? That would help me source the difference.
@CamDavidsonPilon, thank you for answering! I attached the dataset at the beginning of the question but just in case I attach it again: https://github.com/CamDavidsonPilon/lifelines/files/5305589/test.xlsx
@CamDavidsonPilon, may be you will be interested, I also apply a file with the results of comparison of the baseline functions not only of R and Python but also of STATA (the dataset is the same - test.xlsx): Compare_baseline_functions_R_Python_STATA.xlsx
The most interesting thing is that all three programs give three slightly different results:)
#### Data download:
import excel "test.xlsx", sheet("Sheet1") firstrow
sort id months
#### Run the model:
stset months, id(id) failure(event)
stcox var_const, efron
#### Calculate baseline cumulative hazard function:
predict S0_hazard_cumulative, basechazard
#### Calculate baseline survival function:
predict S0_baseline_survivor, basesurv
Hello - I'm having a similar issue. I've fit the same model in CoxTimeVaryingFitter and R's Survival coxph function. Coefficients are the same (or similar enough) but baseline cumulative hazards are quite different. This post caught my eye and I was wondering if anything ever came of it?
Hi @PortlandMichelle. Hard to say what might be going on without seeing the dataset used. Can you reproduce it with a small synthetic dataset?
Hello - TBH I discovered the difference using a data set that I would not be able to share (due to confidentiality concerns), so I replicated it with the data provided in this post by the original poster. I find the identical results they did (in their original screen shot at the top of this post). That's why I was hoping there'd been some analysis using that data set that helped illuminate the issue back in 2020. But if you're able to do some investigation now, if we could both use the original data when this question was originally made, that would be great.
Hello! I am translating the Cox model from R to Python and found that baseline functions in R differs from the baseline functions in Python. Based on the data from the file test.xlsx, the results are as follows:
It seemed strange to me, considering that the lifelines library is based on the codes of functions from R (if I understand correctly).
At the same time, the characteristics of the Cox model and the coefficient before regressor “var_const” turned out to be the same in R and Python (coef[var_cost] = 0,05295).
Tell me please why baseline functions may differ in R and Python?
Code in R:
Code in Python: