ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.05k stars 124 forks source link

Bug Report: add_n() for time dependent coxph() #1625

Closed jaromilfrossard closed 6 months ago

jaromilfrossard commented 7 months ago

Thank you for creating gtsummary!

Currently, in a time-dependent survival::coxph(), the n from add_n() describes the number of observation, instead of the number of id, which might be misleading. Indeed the same statistical model might have different number of observation depending on its data structure used in survival::coxph(). Currently, the number of "id" is not exported in the coxph model, which I suggest to add survival::coxph() https://github.com/therneau/survival/issues/250.

library(survival)
library(gtsummary)

set.seed(42)
n=30
df <- 
  data.frame(
    id = seq_len(n),
    eos_time = round(runif(n,100,200)),
    eos_event = rbinom(n,1,prob=.2),
    event_01 = round(runif(n,20,40)),
    event_02 = round(runif(n,60,80)),
    grp = rep(c(1,0),each= n/2)) 

df$eos_time2 <- df$eos_time
df$eos_time2[df$eos_event==0]<-NA

df_tv <-
  tmerge(df[,c("id","grp")],df[,c("id","eos_time")],id=id,tstop=eos_time) |> 
  tmerge(data2=df[,c("id","event_01","event_02")],event_01 = event(event_01),event_01 = event(event_02),id=id) |> 
  tmerge(data2=df[,c("id","eos_time2")],eos_event = event(eos_time2),id=id)

# same model different n!
# n=30
m <- coxph(Surv(eos_time,eos_event)~grp,data=df)
summary(m)
m |> 
  tbl_regression() |> 
  add_n()

# n=90
m_tv <- coxph(Surv(time = tstart,time2 =tstop,eos_event)~grp,data=df_tv,id=id)
summary(m_tv)
m_tv |> 
  tbl_regression() |> 
  add_n()
larmarange commented 7 months ago

Should be implemented in broom.helpers. Let me explore it

larmarange commented 7 months ago

A quick comment. So far, broom.helpers computes for coxph models a number of observations, a number of event and exposure time globally and for each level of a categorical variable.

This is the current documentation of broom.helpers::tidy_add_n():

For Cox models (survival::coxph()), an individual could be coded with several observations (several rows). n_obs will correspond to the weighted number of observations which could be different from the number of individuals. tidy_add_n() will also compute a (weighted) number of events (n_event) according to the definition of the survival::Surv() object. Exposure time is also returned in exposure column. It is equal to the (weighted) sum of the time variable if only one variable time is passed to survival::Surv(), and to the (weighted) sum of time2 - time if two time variables are defined in survival::Surv().

from: https://larmarange.github.io/broom.helpers/reference/tidy_add_n.html

Currently, the number of observations could be added in gtsummary() with add_n() and the number of events with add_nevent(). There is no function in gtsummary for adding exposure time.

I have quickly drafted some code to compute, in addition to the number of observations, the number of individuals n_ind: https://github.com/larmarange/broom.helpers/pull/251

It has to be noted, for categorical variable, that if there are two observations of the same individual for modality A and one observation for modality B, for now we are applied 2/3 to A and 1/3 to B (such situation occurs with time-varying co-factors). Sampling weights are also taken into account.

I am not sure if the number of individuals should replace the number of observations. They are not exactly the same thing.

However, it would be nice to have therefore a way to display n_obs or n_ind depending on user preference, and/or a way to also display exposure time. Eventually with a more generic version of add_n() allwoing to indicate a glue text with the stats to be displayed?

@ddsjoberg What do you think? Should we try to be more generic? Should we have distinction between the number of observations the number of individuals? or change the default behaviour for cox models?

library(survival)
library(gtsummary)
library(broom.helpers)
#> 
#> Attachement du package : 'broom.helpers'
#> Les objets suivants sont masqués depuis 'package:gtsummary':
#> 
#>     all_continuous, all_contrasts

set.seed(42)
n=30
df <- 
  data.frame(
    id = seq_len(n),
    eos_time = round(runif(n,100,200)),
    eos_event = rbinom(n,1,prob=.2),
    event_01 = round(runif(n,20,40)),
    event_02 = round(runif(n,60,80)),
    grp = factor(rep(c(1,0),each= n/2))
  )

df$eos_time2 <- df$eos_time
df$eos_time2[df$eos_event==0]<-NA

df_tv <-
  tmerge(df[,c("id","grp")],df[,c("id","eos_time")],id=id,tstop=eos_time) |> 
  tmerge(data2=df[,c("id","event_01","event_02")],event_01 = event(event_01),event_01 = event(event_02),id=id) |> 
  tmerge(data2=df[,c("id","eos_time2")],eos_event = event(eos_time2),id=id) |> 
  dplyr::rename(identif = id)

m <- coxph(Surv(eos_time,eos_event)~grp,data=df)
m |> 
  model_get_n()
#> # A tibble: 3 × 5
#>   term        n_obs n_ind n_event exposure
#>   <chr>       <dbl> <dbl>   <dbl>    <dbl>
#> 1 (Intercept)    30    30       7     4843
#> 2 grp1           15    15       4     2420
#> 3 grp0           15    15       3     2423

m_tv <- coxph(Surv(time = tstart,time2 =tstop,eos_event)~grp,data=df_tv,id=identif)
m_tv |> 
  model_get_n()
#> # A tibble: 3 × 5
#>   term        n_obs n_ind n_event exposure
#>   <chr>       <dbl> <dbl>   <dbl>    <dbl>
#> 1 (Intercept)    90    30       7     4843
#> 2 grp1           45    15       4     2420
#> 3 grp0           45    15       3     2423

Created on 2024-04-11 with reprex v2.1.0

ddsjoberg commented 7 months ago

@larmarange we had a similar request very early on in the gtsummary life. I think on the surface it it a super reasonable request. But I found it quickly became tricky to handle this for all types of models

We identified a number of situations where the N users want to report is different for the same model type, and we opted to keep it simple and just report the number of observation because this is something that was clear. Users can pretty easily calculate tabulations for whatever N is appropriate for their scenario in tbl_summary() and merge that table with the regression summary.

I am a bit apprehensive for being able to solve this correctly in all situations....

larmarange commented 7 months ago

@ddsjoberg I completely agree that we cannot cover all cases and I do not plan to cover all of them in broom.helpers.

However, considering that some of the code is very similar between add_n.tbl_regression() and add_nevents.tbl_regression(), maybe we could consider to add to add_n.tbl_regression() two arguements: one to let the user to select another column returned by tidy_plus_plus() and one to customize the header of the col.

ddsjoberg commented 7 months ago

Those are good points! Sounds good to me. Nice!

ddsjoberg commented 6 months ago

I think I can close this as it's handled in broom.helpers. Please re-open if I need to update something in gtsummary

larmarange commented 6 months ago

There is still the question of upgrading add_n() to select a specific column of the tibble returned by broom.helpers, and refactoring add_nevent as a specific case of add_n().

But considering the current work on v2 of gtsummary, it could maybe be postponed after the release of v2?