Closed jaromilfrossard closed 6 months ago
Should be implemented in broom.helpers
. Let me explore it
A quick comment. So far, broom.helpers
computes for coxph models a number of observations, a number of event and exposure time globally and for each level of a categorical variable.
This is the current documentation of broom.helpers::tidy_add_n()
:
For Cox models (survival::coxph()), an individual could be coded with several observations (several rows). n_obs will correspond to the weighted number of observations which could be different from the number of individuals. tidy_add_n() will also compute a (weighted) number of events (n_event) according to the definition of the survival::Surv() object. Exposure time is also returned in exposure column. It is equal to the (weighted) sum of the time variable if only one variable time is passed to survival::Surv(), and to the (weighted) sum of time2 - time if two time variables are defined in survival::Surv().
from: https://larmarange.github.io/broom.helpers/reference/tidy_add_n.html
Currently, the number of observations could be added in gtsummary()
with add_n()
and the number of events with add_nevent()
. There is no function in gtsummary
for adding exposure time.
I have quickly drafted some code to compute, in addition to the number of observations, the number of individuals n_ind
: https://github.com/larmarange/broom.helpers/pull/251
It has to be noted, for categorical variable, that if there are two observations of the same individual for modality A and one observation for modality B, for now we are applied 2/3 to A and 1/3 to B (such situation occurs with time-varying co-factors). Sampling weights are also taken into account.
I am not sure if the number of individuals should replace the number of observations. They are not exactly the same thing.
However, it would be nice to have therefore a way to display n_obs
or n_ind
depending on user preference, and/or a way to also display exposure time. Eventually with a more generic version of add_n()
allwoing to indicate a glue text with the stats to be displayed?
@ddsjoberg What do you think? Should we try to be more generic? Should we have distinction between the number of observations the number of individuals? or change the default behaviour for cox models?
library(survival)
library(gtsummary)
library(broom.helpers)
#>
#> Attachement du package : 'broom.helpers'
#> Les objets suivants sont masqués depuis 'package:gtsummary':
#>
#> all_continuous, all_contrasts
set.seed(42)
n=30
df <-
data.frame(
id = seq_len(n),
eos_time = round(runif(n,100,200)),
eos_event = rbinom(n,1,prob=.2),
event_01 = round(runif(n,20,40)),
event_02 = round(runif(n,60,80)),
grp = factor(rep(c(1,0),each= n/2))
)
df$eos_time2 <- df$eos_time
df$eos_time2[df$eos_event==0]<-NA
df_tv <-
tmerge(df[,c("id","grp")],df[,c("id","eos_time")],id=id,tstop=eos_time) |>
tmerge(data2=df[,c("id","event_01","event_02")],event_01 = event(event_01),event_01 = event(event_02),id=id) |>
tmerge(data2=df[,c("id","eos_time2")],eos_event = event(eos_time2),id=id) |>
dplyr::rename(identif = id)
m <- coxph(Surv(eos_time,eos_event)~grp,data=df)
m |>
model_get_n()
#> # A tibble: 3 × 5
#> term n_obs n_ind n_event exposure
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30 30 7 4843
#> 2 grp1 15 15 4 2420
#> 3 grp0 15 15 3 2423
m_tv <- coxph(Surv(time = tstart,time2 =tstop,eos_event)~grp,data=df_tv,id=identif)
m_tv |>
model_get_n()
#> # A tibble: 3 × 5
#> term n_obs n_ind n_event exposure
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 90 30 7 4843
#> 2 grp1 45 15 4 2420
#> 3 grp0 45 15 3 2423
Created on 2024-04-11 with reprex v2.1.0
@larmarange we had a similar request very early on in the gtsummary life. I think on the surface it it a super reasonable request. But I found it quickly became tricky to handle this for all types of models
We identified a number of situations where the N users want to report is different for the same model type, and we opted to keep it simple and just report the number of observation because this is something that was clear. Users can pretty easily calculate tabulations for whatever N is appropriate for their scenario in tbl_summary()
and merge that table with the regression summary.
I am a bit apprehensive for being able to solve this correctly in all situations....
@ddsjoberg I completely agree that we cannot cover all cases and I do not plan to cover all of them in broom.helpers
.
However, considering that some of the code is very similar between add_n.tbl_regression()
and add_nevents.tbl_regression()
, maybe we could consider to add to add_n.tbl_regression()
two arguements: one to let the user to select another column returned by tidy_plus_plus()
and one to customize the header of the col.
Those are good points! Sounds good to me. Nice!
I think I can close this as it's handled in broom.helpers. Please re-open if I need to update something in gtsummary
There is still the question of upgrading add_n() to select a specific column of the tibble returned by broom.helpers, and refactoring add_nevent as a specific case of add_n().
But considering the current work on v2 of gtsummary, it could maybe be postponed after the release of v2?
Thank you for creating gtsummary!
Currently, in a time-dependent
survival::coxph()
, then
fromadd_n()
describes the number of observation, instead of the number of id, which might be misleading. Indeed the same statistical model might have different number of observation depending on its data structure used insurvival::coxph()
. Currently, the number of "id" is not exported in the coxph model, which I suggest to addsurvival::coxph()
https://github.com/therneau/survival/issues/250.