chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
55 stars 28 forks source link

Flexsurvspline: initial value in 'vmmin' is not finite #5

Closed jrdnmdhl closed 9 years ago

jrdnmdhl commented 9 years ago

Running flexsurvspline is often resulting in errors in the call to optim:

Error in optim(method = "BFGS", par = c(-1.4245564322429, 3.2657103215548,  : 
  initial value in 'vmmin' is not finite

I'm guessing this has to do with the process for generating initial values, though playing around with some initial values didnt result in success.

This can be repro'd using the following code:

testdata <- read.csv("problem.csv")
flexsurvspline(Surv(times,flags)~1,data=testdata,scale="hazard",k=2)

CSV file can be found here: http://s000.tinyupload.com/index.php?file_id=01183619756868535710

Side note Thanks for flexsurv, it is great.

chjackson commented 9 years ago

Your dataset has four extreme outlying observations of (censored) survival times > 72, when the remaining survival times are less than about 18. It works when these four observations are removed, and by playing with the data I managed to get initial values that worked for k=2.

fs <- flexsurvspline(Surv(times,flags)~1,data=testdata,scale="hazard",k=2, inits=c(-2, 3, 0.7, -0.4))
plot(fs)

But it still doesn't fit the later portion of the data very well. I'm guessing that it's impossible to identify a well fitting spline model for this data, the required curvature may be too great and the parameter estimation impossible. It might be possible in theory by specifying lots of knots and fixing some of the parameters. But I'd find it more satisfactory to explain why those four observations are so big first.

From a software point of view, it'd be nice if it tried a bit harder or helped the user in these cases, but it's hard to deal with every potential unusual dataset!

jrdnmdhl commented 9 years ago

Indeed. Thankfully, this is simulated data for the purposes of testing a curve fitting process that I am putting together. I wanted challenging data, though perhaps this is a bit too challenging for a spline model of this class.

For some context, I am trying to do bootstrapping of fits to several related time-to-event distributions so that I can account for correlations in the parameters across those distributions (e.g. the correlation between progression-free survival and overall survival in oncology). In doing so, I need to be quite confident that flexsurvspline will run to completion on all bootstrap replicates.

Do you have any suggested workarounds for these kinds of failures if and when they occur?

chjackson commented 9 years ago

Wrapping each fit in try() would at least allow the simulation loop to continue if one iteration fails. Though that assumes the failures are either very infrequent, or a random sample (in some sense...) of all cases.

jrdnmdhl commented 9 years ago

Thanks for the suggestion. I've had to do that before with bootstrapping of generalized gamma (due to its convergence problems) and I suppose I can do the same as long as the failures are infrequent. For the purposes of estimating uncertainty, I doubt such samples are random. I don't see why they would affect the mean of the samples, but since more extreme observations tend to be problematic I would imagine that it would tend to underestimate uncertainty if they were dropped.

In any case, I don't know that there is any better solution to this than that.

Thanks for your prompt responses!

Geoff-Holmes commented 6 years ago

I was stumped for a while by the same error on what seemed perfectly reasonable survival data with no outliers. Eventually I realised that two zero survival values (relating to surgery death where surgery was the start point for survival) where causing the problem and with those removed all was fine.