chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
53 stars 28 forks source link

flexsurvspline has optimization issues given LTRC data #110

Closed asondhi closed 2 years ago

asondhi commented 2 years ago

When fitting regression models given left truncated survival data, and using the truncation time as a regressor (for example, to test for dependent LT flexibly), flexsurvspline tends to have issues. Example below gives the error:

Error in optim(method = "BFGS", par = c(gamma0 = -1.91428179330847, gamma1 = 0.475527958657091,  : 
  initial value in 'vmmin' is not finite
library(survival)
library(tidyverse)
library(flexsurv)
library(simsurv)

# Simulate left-truncated + right-censored survival data
set.seed(534)

n <- 2000
entry_rate <- 0.6

# Simulate cohort entry times  
cov <- data.frame(id = 1:n) %>%
  mutate(entry = rexp(n(), rate = entry_rate))

# Simulate dependently truncated Weibull survival data
dat <- simsurv(dist = "weibull",
               lambdas = 0.3, 
               gammas = 2.5, 
               betas = c(entry = log(1.03)), 
               x = cov)

# Full data; induce random censoring
data_full <- merge(dat, cov) %>%
  mutate(status = rbinom(n(), 1, 0.8),
         is_lt = eventtime < entry)

# Truncated data
data_lt <- data_full %>%
  filter(!is_lt)

# Fit flexible models for survival conditional on entry using flexsurvreg/flexsurvspline
m_weibull <- flexsurv::flexsurvreg(Surv(entry, eventtime, status) ~ entry, data = data_lt, dist="weibullPH")
m_spline <- flexsurv::flexsurvspline(Surv(entry, eventtime, status) ~ entry, data = data_lt, k = 2)
chjackson commented 2 years ago

This should be fixed now in https://github.com/chjackson/flexsurv-dev/commit/ec4f4297dde9312e773ce211d867a8bd87b10ff9 . Thanks for the report.

The issue was that there are two different heuristics that flexsurvspline uses to determine initial values for optimisation. If the first procedure doesn't work, then it falls back to the second procedure. However when checking if the procedure worked, it only tested for a -log-likelihood of Inf. In this case, it gave a log-likelihood of NaN. It now checks for NaN as well.

asondhi commented 2 years ago

wow thank you so much for the quick fix!! :) much appreciated!