MSKCC-Epi-Bio / tidycmprsk

https://mskcc-epi-bio.github.io/tidycmprsk
GNU Affero General Public License v3.0
22 stars 6 forks source link

Extract estimates for "non-event" status with `tidycmprsk::tidy()` #114

Closed uriahf closed 1 month ago

uriahf commented 2 months ago

Hey, this is a followup question for https://github.com/MSKCC-Epi-Bio/tidycmprsk/issues/113:

Is it possible to extract the estimate of non-event for different time-points?

If I understand correctly in {cmprsk} the estimate for censoring is 0 unless the fixed-time-horizon is set to the maximum value of the observed time: It relates to administrative-censoring and besides that specific time-point the estimates do not add up to 1.

In {tidycmprsk} there is no estimate for censoring:

library(magrittr)

# tidycmprsk

tidycmprsk::cuminc(
  survival::Surv(ttdeath, death_cr) ~ 1, 
  tidycmprsk::trial) |> 
  tidycmprsk::tidy(times = c(12, 24)) 
#> # A tibble: 4 × 12
#>    time outcome    estimate std.error conf.low conf.high n.risk n.event n.censor
#>   <dbl> <chr>         <dbl>     <dbl>    <dbl>     <dbl>  <int>   <dbl>    <dbl>
#> 1    12 death fro…    0.06     0.0168   0.0327    0.0989    177      12        0
#> 2    24 death fro…    0.285    0.0320   0.224     0.349      88      45       88
#> 3    12 death oth…    0.055    0.0162   0.0291    0.0927    177      11        0
#> 4    24 death oth…    0.275    0.0317   0.215     0.338      88      44       88
#> # ℹ 3 more variables: cum.event <dbl>, cum.censor <dbl>, time_max <dbl>

# cmprsk

cmprsk::cuminc(
  ftime = tidycmprsk::trial$ttdeath,
  fstatus = tidycmprsk::trial$death_cr
) |> 
  cmprsk::timepoints(
    times = c(12, 24)
  ) %$%
  est 
#>                         12    24
#> 1 censor             0.000 0.440
#> 1 death from cancer  0.060 0.285
#> 1 death other causes 0.055 0.275

I wonder if I can interpret the difference between the sum of the estimates and 1 as the estimate for non-events?

# tidycmprsk-hack
original_tidycmprsk_output <- tidycmprsk::cuminc(
  survival::Surv(ttdeath, death_cr) ~ 1, 
  tidycmprsk::trial) |> 
  tidycmprsk::tidy(times = c(12, 24)) |> 
  dplyr::select(time, outcome, estimate) 

original_tidycmprsk_output |> 
  dplyr::bind_rows(
    original_tidycmprsk_output |>
      dplyr::group_by(time) |> 
      dplyr::summarise(estimate = 1 - sum(estimate)) |> 
      dplyr::mutate(outcome = "non-event")
)
#> # A tibble: 6 × 3
#>    time outcome            estimate
#>   <dbl> <chr>                 <dbl>
#> 1    12 death from cancer     0.06 
#> 2    24 death from cancer     0.285
#> 3    12 death other causes    0.055
#> 4    24 death other causes    0.275
#> 5    12 non-event             0.885
#> 6    24 non-event             0.440
ddsjoberg commented 2 months ago

It doesn't look like that was a feature we included in the package. I think your suggestion of using one minus the sum of the event probs is reasonable. But with competing risks there is no guarantee the event probs sum to one, so it will probably not match the result from cmprsk (but will be close!).