flexsurvspline errors/warnings/poor-fits when all data is interval censored

chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling

http://chjackson.github.io/flexsurv/

55 stars 27 forks source link

flexsurvspline errors/warnings/poor-fits when all data is interval censored #18

Closed jwdink closed 11 months ago

jwdink commented 7 years ago

The flexsurvspline function can't determine boundary knots when all data are interval censored. (Looking at the code, it appears to determine boundary knots by picking the minimum and maximum 'death' time. But if Surv(type='interval'), and all events are either 0 (right-censored), or 3 (interval-censored), then there aren't any death times, so it returns -Inf,Inf for the boundary-knots. )

When manually specifying bknots, it sometimes gives good fits, but small changes in the bknots can lead to extremely poor/unpredictable fits.

I've attached some code with a minimal reproducible example: flexsurvspline-interval-data.R.zip

chjackson commented 7 years ago

Thanks for the report. I think the automatic choice of knots where all data points are censored is fixed in the latest commit. I've just used the min/max censoring times instead of the death times. Though finding valid initial values can still be problematic for this kind of situation, but I think your example works now.

With the weird fit example, it looks like you're specifying a lowest knot that's higher than the lowest observed time. Theoretically I'm not sure if that should be a problem, after all it's still a valid model, linear outside the knots, but something may be assuming the knots are all outside the data. I'll have a closer look.

jwdink commented 7 years ago

Thanks for the speedy response.

Just as a quick note for your closer look: even with automatically determined boundary knots, weird fits seem to happen. For example, in the previous script, you can try:

fit0 <- flexsurvspline(
  formula = Surv(time = time-ifelse(as.logical(event), .999, 0),
                 time2 = time,
                 event = ifelse(as.logical(event), 3, 0),
                 type = 'interval') ~ 1, 
  data = df_example,
  knots = log(c(4,12))
)
plot(survfit(Surv(time = time-ifelse(as.logical(event), .999, 0),
                  time2 = time,
                  event = ifelse(as.logical(event), 3, 0),
                  type = 'interval') ~ 1, 
             data = df_example))
lines(fit0)

This gives an odd fit. Changing it to knots=log(4) yields a good fit. And using automatically determined knots (e.g., k=2) produces an error.

chjackson commented 11 months ago

Just tidying up open issues (not such a speedy response this time :) . I think this is working as expected in the current (development) version. Internal knots with interval-censored data are now automatically chosen based on quantiles of the interval midpoints together with the event times. It occasionally fails to find initial values, depending on knot locations, but that is a common problem with flexsurvspline and not related to interval censoring.

As a bonus, I think the plot() method now produces the Kaplan-Meier curve by itself now with interval censored data.