chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
55 stars 27 forks source link

Log logistic and Gompertz - warning messages (not converging) #74

Closed tku23 closed 1 year ago

tku23 commented 4 years ago

Hi there,

I am new to R and have attempted my first use of flexsurvreg (which is amazing!). However, I encountered the below warnings for log logistic and Gompertz. Had no issue with the other functions and they fit the data very well (exponential, Weibull, lognormal and generalised gamma). Could anyone kindly provide any advice on how I can overcome this issue?

I have even tried the control = fnscale…, but it didn't work. When I tried plotting the curves using lines( ), it looks way off from the original data and something just doesn't seem right.

Much appreciated and thank you

> rm(list=ls(all=TRUE))
> library(survival)
> library("flexsurv")
> data<- read.csv("TKIPFS.csv")
> attach(data)
> data
   start_time_event start_time_censor end_time_event end_time_censor n_events
1            0.0001           0.37505           0.75           10000        3
2            0.7500           1.12500           1.50           10000        9
3            1.5000           1.87500           2.25           10000        5
4            2.2500           2.62500           3.00           10000        9
5            3.0000           3.37500           3.75           10000        2
6            3.7500           4.12500           4.50           10000       17
7            4.5000           4.87500           5.25           10000        7
8            5.2500           5.62500           6.00           10000       15
9            6.0000           6.37500           6.75           10000       10
10           6.7500           7.12500           7.50           10000       10
11           7.5000           7.87500           8.25           10000       10
12           8.2500           8.62500           9.00           10000       14
13           9.0000           9.37500           9.75           10000       20
14           9.7500          10.12500          10.50           10000        4
15          10.5000          10.87500          11.25           10000       17
16          11.2500          11.62500          12.00           10000        2
17          12.0000          12.37500          12.75           10000        7
18          12.7500          13.12500          13.50           10000        1
19          13.5000          13.87500          14.25           10000       10
20          14.2500          14.62500          15.00           10000        2
21          15.0000          15.37500          15.75           10000       13
22          15.7500          16.12500          16.50           10000        2
23          16.5000          16.87500          17.25           10000        3
24          17.2500          17.62500          18.00           10000        3
25          18.0000          18.37500          18.75           10000        3
26          18.7500          19.12500          19.50           10000        1
27          19.5000          19.87500          20.25           10000        2
28          20.2500          20.62500          21.00           10000        3
29          21.0000          21.37500          21.75           10000        0
30          21.7500          22.12500          22.50           10000        1
31          22.5000          22.87500          23.25           10000        1
32          23.2500          23.62500          24.00           10000        1
33          24.0000          24.37500          24.75           10000        0
34          24.7500          25.12500          25.50           10000        0
35          25.5000          25.87500          26.25           10000        0
36          26.2500          26.62500          27.00           10000        0
   n_censors
1          3
2          3
3          3
4          3
5          0
6          0
7          0
8          0
9          0
10         0
11         0
12         0
13         1
14         1
15         1
16         1
17         2
18         2
19         2
20         2
21         5
22         5
23         5
24         5
25         5
26         5
27         5
28         5
29         1
30         1
31         1
32         1
33         1
34         1
35         1
36         1
> times_start <-c(  rep(start_time_censor, n_censors), rep(start_time_event, n_events) )
> times_end <-c(  rep(end_time_censor, n_censors), rep(end_time_event, n_events)  )
> model <- flexsurvreg(Surv(times_start, times_end, type="interval2")~1, dist="llogis")
**Warning message:
In flexsurvreg(Surv(times_start, times_end, type = "interval2") ~  :
  Optimisation has probably not converged to the maximum likelihood - Hessian is not positive definite.** 
> model
Call:
flexsurvreg(formula = Surv(times_start, times_end, type = "interval2") ~ 
    1, dist = "llogis")

Estimates: 
       est    L95%   U95%   se   
shape   1.08     NA     NA     NA
scale  10.04     NA     NA     NA

N = 279,  Events: 0,  Censored: 279
Total time at risk: 2860.5
Log-likelihood = -838.6827, df = 2
AIC = 1681.365

> model <- flexsurvreg(Surv(times_start, times_end, type="interval2")~1, dist="gompertz") 
**Warning messages:
1: In log(pmax - pmin) : NaNs produced
2: In log(pmax - pmin) : NaNs produced**
> model
Call:
flexsurvreg(formula = Surv(times_start, times_end, type = "interval2") ~ 
    1, dist = "gompertz")

Estimates: 
       est       L95%      U95%      se      
shape  1.00e-03  1.00e-03  1.00e-03  2.13e-08
rate   7.70e-04  2.06e-04  2.87e-03  5.17e-04

N = 279,  Events: 0,  Censored: 279
Total time at risk: 2860.5
Log-likelihood = -1544.013, df = 2
AIC = 3092.026

> model <- flexsurvreg(Surv(times_start, times_end, type="interval2")~1, dist="llogis", control=list(fnscale=-1600))
**Warning message:
In flexsurvreg(Surv(times_start, times_end, type = "interval2") ~  :
  Optimisation has probably not converged to the maximum likelihood - Hessian is not positive definite.** 
> model
Call:
flexsurvreg(formula = Surv(times_start, times_end, type = "interval2") ~ 
    1, dist = "llogis", control = list(fnscale = -1600))

Estimates: 
       est    L95%   U95%   se   
shape  11.36     NA     NA     NA
scale   1.06     NA     NA     NA

N = 279,  Events: 0,  Censored: 279
Total time at risk: 2860.5
Log-likelihood = -6478.852, df = 2
AIC = 12961.7

> model <- flexsurvreg(Surv(times_start, times_end, type="interval2")~1, dist="llogis", control=list(fnscale=1600))
**Warning message:
In flexsurvreg(Surv(times_start, times_end, type = "interval2") ~  :
  Optimisation has probably not converged to the maximum likelihood - Hessian is not positive definite.** 
> model
Call:
flexsurvreg(formula = Surv(times_start, times_end, type = "interval2") ~ 
    1, dist = "llogis", control = list(fnscale = 1600))

Estimates: 
       est    L95%   U95%   se   
shape   1.08     NA     NA     NA
scale  10.04     NA     NA     NA

N = 279,  Events: 0,  Censored: 279
Total time at risk: 2860.5
Log-likelihood = -838.6827, df = 2
AIC = 1681.365

> 
tku23 commented 4 years ago

Hi there, Just wanting to provide an update. I've managed to solve Gompertz by using method="Nelder-Mead". I came across the optim page and wanted to know why does flexsurvreg uses BFGS as the default? Will this "bias" the results by doing Nelder-Mead for one distribution and BFGS for the rest?

For log-logistic, I still couldn't solve it despite trying all the different methods. Do you have any suggestions I could try? Thank you.

I found this "If 'false' convergence is reported with a non-positive-definite Hessian, then consider tightening the tolerance criteria for convergence" - how do I tighten the tolerance criteria? https://rdrr.io/cran/flexsurv/man/flexsurvreg.html

chjackson commented 4 years ago

There isn't necessarily going to be a solution. The likelihood surface that it is trying to maximise might just be too flat, so that there is no meaningful maximum. This usually happens when the data don't contain sufficient information about the parameters of the model.

BFGS is used as a default because it makes use of the gradient of the log-likelihood, which makes optimisation faster. In well-behaved problems, all optimisers should give the same answer. If they don't, then the answer with the higher likelihood (lower minus log likelihood) should be preferred.

Go to help(optim) as the manual page suggests - see "reltol" and "abstol". For example, flexsurvreg(..., control=list(reltol=1e-16), though this might not make any difference.