chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
54 stars 28 forks source link

error "invalid survival times for this distribution" for all interval censored data #62

Open tdhock opened 5 years ago

tdhock commented 5 years ago

Hi I am using flexsurv via

library(flexsurv)
library(penaltyLearning)
library(survival)
data(neuroblastomaProcessed, package="penaltyLearning")
X.mat <- neuroblastomaProcessed$feature.mat[, c("log.n", "log.hall")]
y.mat <- neuroblastomaProcessed$target.mat
train.df <- data.frame(X.mat, y.mat)
fit.survival <- survival::survreg(
  Surv(min.L, max.L, type="interval2") ~ log.n + log.hall,
  train.df, dist="gaussian")
fit.survival
fit.flex <- flexsurv::flexsurvreg(
  Surv(exp(min.L), exp(max.L), type="interval2") ~ log.n + log.hall,
  data=train.df,
  dist="lnorm")

I was expecting that flexsurvreg would estimate the same model as survival::survreg. Instead, I got an error on my system:

> fit.survival <- survival::survreg(
+   Surv(min.L, max.L, type="interval2") ~ log.n + log.hall,
+   train.df, dist="gaussian")
> fit.survival
Call:
survival::survreg(formula = Surv(min.L, max.L, type = "interval2") ~ 
    log.n + log.hall, data = train.df, dist = "gaussian")

Coefficients:
(Intercept)       log.n    log.hall 
 -2.5470812   0.9339951   1.0142676 

Scale= 0.5408448 

Loglik(model)= -199.3   Loglik(intercept only)= -547.4
    Chisq= 696.08 on 2 degrees of freedom, p= <2e-16 
n= 3418 
> fit.flex <- flexsurv::flexsurvreg(
+   Surv(exp(min.L), exp(max.L), type="interval2") ~ log.n + log.hall,
+   data=train.df,
+   dist="lnorm")
Error in (function (formula, data, weights, subset, na.action, dist = "weibull",  : 
  Invalid survival times for this distribution
> 
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flexsurv_1.1.1             icenReg_2.0.9             
[3] coda_0.19-2                Rcpp_1.0.0                
[5] survival_2.42-3            penaltyLearning_2018.09.04
[7] data.table_1.11.8          namedCapture_2019.02.25   

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2 pillar_1.3.0       compiler_3.5.1     plyr_1.8.4        
 [5] bindr_0.1.1        iterators_1.0.11   tools_3.5.1        magic_1.5-9       
 [9] tibble_1.4.2       gtable_0.2.0       lattice_0.20-35    pkgconfig_2.0.2   
[13] rlang_0.3.0.1      Matrix_1.2-14      foreach_1.5.1      mvtnorm_1.0-8     
[17] bindrcpp_0.2.2     dplyr_0.7.8        grid_3.5.1         tidyselect_0.2.5  
[21] mstate_0.2.11      deSolve_1.22       glue_1.3.0         R6_2.3.0          
[25] tidyr_0.8.2        ggplot2_3.1.0      purrr_0.2.5        magrittr_1.5      
[29] scales_1.0.0       codetools_0.2-15   splines_3.5.1      assertthat_0.2.0  
[33] abind_1.4-7        colorspace_1.4-0   quadprog_1.5-5     geometry_0.3-6    
[37] muhaz_1.2.6.1      lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4      
> 

Is this because interval censored data are NOT supported? all of the outputs in these data are interval/left/right censored. (no un-censored outputs)

If interval censored data are supported, then is this a bug? Any known fixes/work-arounds?

chjackson commented 5 years ago

The error happens when trying to find initial values for the flexsurvreg fit. It does this by calling survreg(..., dist="lognormal") on the natural-scale survival times. This results in the invalid survival times error you see. I'm not sure whether or not survreg is supposed to work on data that are interval censored from 0 to Inf....

You can work around this by supplying initial values: flexsurvreg(..., inits=c(1,1,0,0),...) works for me on this example.

I should probably work around this too - perhaps by calling survreg(..., dist="gaussian") on the log times as you did.

tdhock commented 5 years ago

thanks for the work-around, works fine if inits is specified.

sorry for the confusion. actually there are no data that are interval censored from 0 to Inf (that would be meaningless). Each observation is either left censored or right censored, though. yes, survreg is supposed to work in this case.

On Fri, Apr 5, 2019 at 12:52 PM Chris Jackson notifications@github.com wrote:

The error happens when trying to find initial values for the flexsurvreg fit. It does this by calling survreg(..., dist="lognormal") on the natural-scale survival times. This results in the invalid survival times error you see. I'm not sure whether or not survreg is supposed to work on data that are interval censored from 0 to Inf....

You can work around this by supplying initial values: flexsurvreg(..., inits=c(1,1,0,0),...) works for me on this example.

I should probably work around this too - perhaps by calling survreg(..., dist="gaussian") on the log times as you did.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chjackson/flexsurv-dev/issues/62#issuecomment-480401754, or mute the thread https://github.com/notifications/unsubscribe-auth/AA478hkKPhTIIUZQdGEuZ_knYLWIzGy7ks5vd6lsgaJpZM4cfgAr .