CamDavidsonPilon / lifelines

Survival analysis in Python
lifelines.readthedocs.org
MIT License
2.35k stars 557 forks source link

includes RMST, difference in RMST and confidence intervals #1526

Open bayesfactor opened 1 year ago

bayesfactor commented 1 year ago

Implements RMST, difference in RMST from the R package: https://cran.r-project.org/package=survRM2

Includes RMST of one population (point estimate and confidence interval), difference in RMST of 2 populations (point estimate, p-value, and confidence intervals). Addresses #821

bayesfactor commented 1 year ago

@CamDavidsonPilon, I would very much appreciate your review (and approval, conditional on acceptance). My organization is really struggling without a sound method to generate confidence intervals on a difference in 2 survival populations.

CamDavidsonPilon commented 1 year ago

Hi @bayesfactor! Thanks for the PR (sorry about the delay).

We have an lifelines.utils.restricted_mean_survival_time function now - have you compared against that? We don't have a difference_ though, so that helps.

bayesfactor commented 1 year ago

@CamDavidsonPilon, thank you for your response. I'm sorry, I overlooked lifelines.utils.restricted_mean_survival_time. I now validated that the 2 implementations produce the same point estimate on a test set. My newer implementation provides an estimate of standard error, and lower/upper confidence intervals based on that standard error, which I think is a nice feature. Perhaps you or I should put SE, LCI, and UCI into your original implementation of RMST. If you agree, please let me know if you would rather do it or ask me to.

As you said, the main point of this pull request is to provide a confidence interval on difference in RMST, which is a new feature as far as I know.

bayesfactor commented 1 year ago

@CamDavidsonPilon, I'm hoping to hear back from you about whether you or I should include the other outputs from RMST, and therefore how to proceed with the difference in RMST (including confidence intervals) functionality. Can you please let me know what is your preference?

bayesfactor commented 10 months ago

I fixed an issue with RMST where, if the followup interval is greater than the last observed event, the confidence intervals were NaN. The latest commit resolves that issue and makes some minor refactoring for efficiency.

@CamDavidsonPilon, I'm hoping to hear back from you about whether you or I should include the other outputs from RMST, and therefore how to proceed with the difference in RMST (including confidence intervals) functionality. Can you please let me know what is your preference?

CamDavidsonPilon commented 10 months ago

Hi @bayesfactor,

Thanks for keeping up with this! Can you explain how this works for parametric fitters? In my head, to compute the AUC of a parametric fitter, some scipy.integrate.quad procedure is necessary.

bayesfactor commented 10 months ago

Great point, this won't work for all fitters. The current implementation depends on a fitter.event_table, so it only works for fitters that have an event_table property. Using data from event_table, the code does a numerical integration: https://github.com/bayesfactor/lifelines/blob/107da27a637cc2ae4125666b8ea2b3e217f2c699/lifelines/statistics.py#L392

I don't know how often people use other fitters; I could a) modify the code to check if event_table exists, and if not, compute it b) modify the existing utils.restricted_mean_survival_time to optionally return the RMST variance needed for the calculation of a confidence interval in the difference in RMST https://github.com/CamDavidsonPilon/lifelines/blob/c9b136b1d1d0e1f30eb42f6599e1d31e8b1aa303/lifelines/utils/__init__.py#L209

CamDavidsonPilon commented 10 months ago

I think we can combine the functions! Have a global restricted_mean_survival_time that, based on the fitter, chooses an implementation. The current implementation for computing the variance for KMF's is imprecise compared to yours.

bayesfactor commented 10 months ago

upon further thought, should we push one step further upstream and give all fitters an event_table? It seems like a handy attribute and that making fitters more similar would be beneficial.

bayesfactor commented 10 months ago

Unless I'm missing something, my last commit moved the event_table property into the BaseFitter class so that all fits get an event_table property. This passes tests -- is it correct and kosher?