Open bayesfactor opened 1 year ago
@CamDavidsonPilon, I would very much appreciate your review (and approval, conditional on acceptance). My organization is really struggling without a sound method to generate confidence intervals on a difference in 2 survival populations.
Hi @bayesfactor! Thanks for the PR (sorry about the delay).
We have an lifelines.utils.restricted_mean_survival_time
function now - have you compared against that? We don't have a difference_
though, so that helps.
@CamDavidsonPilon, thank you for your response. I'm sorry, I overlooked lifelines.utils.restricted_mean_survival_time
. I now validated that the 2 implementations produce the same point estimate on a test set. My newer implementation provides an estimate of standard error, and lower/upper confidence intervals based on that standard error, which I think is a nice feature. Perhaps you or I should put SE, LCI, and UCI into your original implementation of RMST. If you agree, please let me know if you would rather do it or ask me to.
As you said, the main point of this pull request is to provide a confidence interval on difference in RMST, which is a new feature as far as I know.
@CamDavidsonPilon, I'm hoping to hear back from you about whether you or I should include the other outputs from RMST, and therefore how to proceed with the difference in RMST (including confidence intervals) functionality. Can you please let me know what is your preference?
I fixed an issue with RMST where, if the followup interval is greater than the last observed event, the confidence intervals were NaN. The latest commit resolves that issue and makes some minor refactoring for efficiency.
@CamDavidsonPilon, I'm hoping to hear back from you about whether you or I should include the other outputs from RMST, and therefore how to proceed with the difference in RMST (including confidence intervals) functionality. Can you please let me know what is your preference?
Hi @bayesfactor,
Thanks for keeping up with this! Can you explain how this works for parametric fitters? In my head, to compute the AUC of a parametric fitter, some scipy.integrate.quad
procedure is necessary.
Great point, this won't work for all fitters. The current implementation depends on a fitter.event_table
, so it only works for fitters that have an event_table
property. Using data from event_table
, the code does a numerical integration:
https://github.com/bayesfactor/lifelines/blob/107da27a637cc2ae4125666b8ea2b3e217f2c699/lifelines/statistics.py#L392
I don't know how often people use other fitters; I could
a) modify the code to check if event_table
exists, and if not, compute it
b) modify the existing utils.restricted_mean_survival_time
to optionally return the RMST variance needed for the calculation of a confidence interval in the difference in RMST
https://github.com/CamDavidsonPilon/lifelines/blob/c9b136b1d1d0e1f30eb42f6599e1d31e8b1aa303/lifelines/utils/__init__.py#L209
I think we can combine the functions! Have a global restricted_mean_survival_time
that, based on the fitter, chooses an implementation. The current implementation for computing the variance for KMF's is imprecise compared to yours.
upon further thought, should we push one step further upstream and give all fitters an event_table
? It seems like a handy attribute and that making fitters more similar would be beneficial.
Unless I'm missing something, my last commit moved the event_table
property into the BaseFitter class so that all fits get an event_table
property. This passes tests -- is it correct and kosher?
Implements RMST, difference in RMST from the R package: https://cran.r-project.org/package=survRM2
Includes RMST of one population (point estimate and confidence interval), difference in RMST of 2 populations (point estimate, p-value, and confidence intervals). Addresses #821