chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
53 stars 28 forks source link

Competing risks: Joint likelihood versus cause-specific likelihoods #125

Closed mattwarkentin closed 2 years ago

mattwarkentin commented 2 years ago

Hi @chjackson,

I have a pretty simple multi-state model where participants start in a healthy state and can only transition into one of two states (death from cause-of-interest or death from any other cause). Pretty simple competing risks setup. I have read through your multi-state vignette (and some other papers), and I understand that you can either fit a single model call for the joint likelihood, if you restructure/stack your data, or you can fit separate cause-specific models and then combined them to form the full multi-state model.

I am wondering if I have done this correctly in the toy example below. I am trying to obtain the "same" prediction for each approach (i.e. five-year risk of death due to cause-of-interest), but I am not sure if this is correct. Hoping to get your insights. For the joint likelihood approach, I am not confident I have stacked the data into the necessary format, and I am not sure if I am making predictions the correct way using this joint model.

library(tidyverse)
library(riskRegression)
library(flexsurv)

# Setup Data ----
orig <- 
  sampleData(1000) %>% 
  as_tibble() %>% 
  mutate(
    id = 1:n(),
    cause1 = if_else(event==1, 1L, 0L),
    cause2 = if_else(event==2, 1L, 0L)
  ) %>% 
  select(id, time, event, cause1, cause2)

orig
#> # A tibble: 1,000 × 5
#>       id   time event cause1 cause2
#>    <int>  <dbl> <dbl>  <int>  <int>
#>  1     1 15.4       0      0      0
#>  2     2  4.11      2      0      1
#>  3     3  6.40      0      0      0
#>  4     4  0.607     1      1      0
#>  5     5  3.87      0      0      0
#>  6     6  0.914     1      1      0
#>  7     7  6.16      0      0      0
#>  8     8  6.38      2      0      1
#>  9     9  6.75      0      0      0
#> 10    10  4.48      1      1      0
#> # … with 990 more rows

stacked <- 
  orig %>% 
  pivot_longer(
    cols = c(cause1, cause2),
    names_prefix = 'cause',
    names_to = 'trans',
    values_to = 'status'
  )

stacked
#> # A tibble: 2,000 × 5
#>       id   time event trans status
#>    <int>  <dbl> <dbl> <chr>  <int>
#>  1     1 15.4       0 1          0
#>  2     1 15.4       0 2          0
#>  3     2  4.11      2 1          0
#>  4     2  4.11      2 2          1
#>  5     3  6.40      0 1          0
#>  6     3  6.40      0 2          0
#>  7     4  0.607     1 1          1
#>  8     4  0.607     1 2          0
#>  9     5  3.87      0 1          0
#> 10     5  3.87      0 2          0
#> # … with 1,990 more rows

# Joint Likelihood ----

joint_model <- flexsurvreg(
  formula = Surv(time, status) ~ trans,
  data = stacked,
  dist = 'weibull'
)

# Separate Cause-Specific Models ----

m1 <- flexsurvreg(
  formula = Surv(time, event==1) ~ 1,
  data = orig,
  dist = 'weibull'
)

m2 <- flexsurvreg(
  formula = Surv(time, event==2) ~ 1,
  data = orig,
  dist = 'weibull'
)

# Make Predictions ----

tmat <- rbind(c(NA,1,2), c(NA,NA,NA), c(NA,NA,NA))

sep_models <- fmsm(m1, m2, trans = tmat)

pmatrix.fs(sep_models, trans = tmat, t = 5)[1, 3]
#> [1] 0.1638662

pmatrix.fs(joint_model, trans = tmat, t = 5)[1, 3]
#> [1] 0.2067477
chjackson commented 2 years ago

In your implementation of the "joint" model, the covariate representing the transition only affects the Weibull scale parameter and not the shape parameter. So it will make the accelerated failure time assumption, and assume the shape is the same for every transition. Your "separate" models will have different shape and scale parameters per transition. So to mimic that in a single model fit, just do Surv(time, status) ~ trans + shape(trans) or add anc = list(shape = ~trans).

mattwarkentin commented 2 years ago

Thanks for the clarification, @chjackson. Much appreciated.