ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.04k stars 114 forks source link

Feature request: Re-order rows for merged mixed models tables that include random effects component #1567

Open strengejacke opened 10 months ago

strengejacke commented 10 months ago

I think when merging mixed models tables, it would make sense to sepatate the fixed from the random effects part. This is how it currently looks like in gtsummary (you see that residual SD is at the bottom for the first model, but the first "coefficient" for the second model, which is confusing as well):

library(lme4)
#> Loading required package: Matrix
library(gtsummary)
data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)
m2 <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)

tbl_merge(
  list(
    tbl_regression(m1, tidy_fun = broom.mixed::tidy),
    tbl_regression(m2, tidy_fun = broom.mixed::tidy)
  )
)
Characteristic Table 1 Table 2
Beta 95% CI1 Beta 95% CI1
carer' age 0.00 -0.03, 0.02

carer's gender 0.47 -0.09, 1.0

elder's dependency



    independent

    slightly dependent 1.3 0.28, 2.2

    moderately dependent 2.6 1.7, 3.6

    severely dependent 4.3 3.3, 5.2

cluster.sd__(Intercept) 0.70


Residual.sd__Observation 3.6
26
Days

10 7.4, 13
Subject.sd__(Intercept)

25
Subject.cor__(Intercept).Days

0.07
Subject.sd__Days

5.9
1 CI = Confidence Interval

Created on 2023-11-06 with reprex v2.0.2

Would be nice if it would look something like shown here: https://strengejacke.github.io/sjPlot/articles/tab_mixed.html#mixed-models-summaries-as-html-table

Is this possible and something you would consider?

ddsjoberg commented 10 months ago

Thank you @strengejacke ! Yes, this is possible, but honestly, it's not a cute solution. The merging is a generic function for any gtsummary table (mostly commonly not regression summaries) and doesn't make distinctions between the fixed and random effects of the model. To get all the fixed effects together, we'd just re-arrange the rows in the underlying data frame.

library(gtsummary)
library(lme4)
#> Loading required package: Matrix

data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)
m2 <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)

tbl <- 
  tbl_merge(
    list(
      tbl_regression(m1, tidy_fun = broom.mixed::tidy),
      tbl_regression(m2, tidy_fun = broom.mixed::tidy)
    )
  ) |> 
  # move random components to the bottom of the table (those labels always contain "__")
  modify_table_body(
    \(x) x |> dplyr::arrange(grepl(pattern = "__", x = label))
  )

image

Created on 2023-11-06 with reprex v2.0.2

But an update that would group and keep the estimates together would be great, and something I think @larmarange has been thinking about (i.e. the multicomponent models). Joseph, let's connect soon and discuss some potential upgrades!

strengejacke commented 10 months ago

As a follow (or I can open another issue, if you like), I was looking for "more readable" names of the random effects, maybe similar to what parameters::model_parameters() returns for mixed models. I realize there's a tidy_fun argument, but it seems to be working from broom::tidy() only, or maybe it's rather for "internal" use - or am I missing something?

library(gtsummary)
library(lme4)
#> Loading required package: Matrix

data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)

tbl_regression(m1, tidy_fun = parameters::model_parameters)
#> ✖ There was an error calling `tidy_fun()`. Most likely, this is because the
#> function supplied in `tidy_fun=` was misspelled, does not exist, is not
#> compatible with your object, or was missing necessary arguments (e.g. `conf.level=` or `conf.int=`). See error message below.
#> Error:
#> ! Error in model_parameters.merMod(x = new("lmerMod", resp =
#>   new("lmerResp", : argument "model" is missing, with no default
#> Backtrace:
#>      ▆
#>   1. ├─gtsummary::tbl_regression(m1, tidy_fun = parameters::model_parameters)
#>   2. ├─gtsummary:::tbl_regression.lmerMod(m1, tidy_fun = parameters::model_parameters)
#>   3. │ └─gtsummary:::tbl_regression.default(...)
#>   4. │   └─gtsummary:::tidy_prep(...)
#>   5. │     └─... %>% ...
#>   6. ├─rlang::eval_tidy(.)
#>   7. ├─broom.helpers::tidy_plus_plus(...)
#>   8. │ └─model %>% ...
#>   9. └─broom.helpers::tidy_and_attach(...)
#>  10.   └─base::tryCatch(...)
#>  11.     └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  12.       └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  13.         └─value[[3L]](cond)
#>  14.           └─base::tryCatch(...)
#>  15.             └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  16.               └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  17.                 └─value[[3L]](cond)
#>  18.                   └─cli::cli_abort(as.character(e), call = NULL)
#>  19.                     └─rlang::abort(...)
tbl_regression(m1, tidy_fun = function(x, ...) parameters::model_parameters(model = x, ...))
#> Error in `dplyr::left_join()`:
#> ! Join columns in `x` must be present in the data.
#> ✖ Problem with `term`.
#> Backtrace:
#>      ▆
#>   1. ├─gtsummary::tbl_regression(...)
#>   2. ├─gtsummary:::tbl_regression.lmerMod(...)
#>   3. │ └─gtsummary:::tbl_regression.default(...)
#>   4. │   └─gtsummary:::tidy_prep(...)
#>   5. │     └─... %>% ...
#>   6. ├─rlang::eval_tidy(.)
#>   7. ├─broom.helpers::tidy_plus_plus(...)
#>   8. │ └─res %>% tidy_identify_variables(quiet = quiet) %>% ...
#>   9. ├─broom.helpers::tidy_add_contrasts(.)
#>  10. │ └─broom.helpers::tidy_get_model(x)
#>  11. ├─broom.helpers::tidy_identify_variables(., quiet = quiet)
#>  12. │ └─x %>% dplyr::left_join(variables_list, by = "term")
#>  13. ├─dplyr::left_join(., variables_list, by = "term")
#>  14. └─dplyr:::left_join.data.frame(., variables_list, by = "term")
#>  15.   └─dplyr:::join_mutate(...)
#>  16.     └─dplyr:::join_cols(...)
#>  17.       └─dplyr:::check_join_vars(by$x, x_names, by$condition, "x", error_call = error_call)
#>  18.         └─rlang::abort(bullets, call = error_call)
tbl_regression(m1, tidy_fun = function(x, ...) insight::standardize_names(parameters::model_parameters(model = x, ...), style = "broom"))
#> Error in if (result$label %in% c("Beta", "exp(Beta)")) {: argument is of length zero

Created on 2023-11-06 with reprex v2.0.2

ddsjoberg commented 10 months ago

I would have expected the last call to run without error. tbl_regression() often uses the parameters package to do the tidying with a wrapper that ultimately does something very similar to what you have. Do you mind posting the reprex there? Or I can do it later as well.

library(gtsummary)
library(lme4)
#> Loading required package: Matrix

data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)

tbl_regression(m1, tidy_fun = broom.helpers::tidy_parameters)
#> Error in if (result$label %in% c("Beta", "exp(Beta)")) {: argument is of length zero

Created on 2023-11-06 with reprex v2.0.2

larmarange commented 10 months ago

Hi.

After exploring a little, I have the feeling that there are two areas of discussion:

@strengejacke for your information, tbl_regression() relies on broom.helpers::tidy_plus_plus() which is in charge of tidying the results of the model and adding additional information (like identification of the variable name, level name for categorical factors, identifying the type of terms, etc.).

Regarding mixed models, by default, tbl_regression() use broom.mixed::tidy() as tidier function and force effects = "mixed", cf. https://github.com/ddsjoberg/gtsummary/blob/30ef7175b9cc935ddcfd41f7c9c2418f3ee52cd8/R/tbl_regression_methods.R#L89 The way it is currently implemented is problematic because it does not check if the effects is part of .... Therefore, a syntax like tbl_regression(m2, effects = "all") will fails.

The default tidier used by tidy_plus_plus() is tidy_with_broom_or_parameters(), cf. https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/custom_tidiers.R#L81 For mixed models, it is charging broom.mixed and use tidy() by default. By default, tidy_with_broom_or_parameters() returns both fixed and random effects. We could update broom.helpers to return only fixed effects by default unless effects is passed. In that case, the specific methods defined for tbl_regression() in gtsummary could be removed (it would simplify the code).

Here, is the structure of the tibble returned by tidy()

> tidy(m2)
# A tibble: 6 × 6
  effect   group    term                  estimate std.error statistic
  <chr>    <chr>    <chr>                    <dbl>     <dbl>     <dbl>
1 fixed    NA       (Intercept)           251.          6.82     36.8 
2 fixed    NA       Days                   10.5         1.55      6.77
3 ran_pars Subject  sd__(Intercept)        24.7        NA        NA   
4 ran_pars Subject  cor__(Intercept).Days   0.0656     NA        NA   
5 ran_pars Subject  sd__Days                5.92       NA        NA   
6 ran_pars Residual sd__Observation        25.6        NA        NA 

The type of effect (fixed or random) is stored in the effect column. Additionnal information is stored in group.

There has been discussion about terms disambiguation (because in some cases two identical values are used for terms column): https://github.com/larmarange/broom.helpers/pull/90 & https://github.com/larmarange/broom.helpers/issues/98

Therefore, tidy_plus_plus() disambiguate terms, by prefixed terms with the value of group.

> tidy_plus_plus(m2) |> select(term, original_term, var_type, effect, group, var_label)
# A tibble: 5 × 6
  term                          original_term                 var_type   effect   group    var_label                    
  <chr>                         <chr>                         <chr>      <chr>    <chr>    <chr>                        
1 Days                          Days                          continuous fixed    NA       Days                         
2 Subject.sd__(Intercept)       Subject.sd__(Intercept)       ran_pars   ran_pars Subject  Subject.sd__(Intercept)      
3 Subject.cor__(Intercept).Days Subject.cor__(Intercept).Days ran_pars   ran_pars Subject  Subject.cor__(Intercept).Days
4 Subject.sd__Days              Subject.sd__Days              ran_pars   ran_pars Subject  Subject.sd__Days             
5 Residual.sd__Observation      Residual.sd__Observation      ran_pars   ran_pars Residual Residual.sd__Observation 

Random effects are also identified as such in var_type.

So far, there is no additional work done regarding var_label. Labelling could eventually be improved in broom.helpers. But it will require to clearly define that would be expected with different scenarios. I'm open to any suggestion.

Regarding model_parameters, we cannot use directly model_parameters() because the column names are inconsistent with tidy()

> parameters::model_parameters(m2) |> as_tibble()
# A tibble: 6 × 11
  Parameter            Coefficient    SE    CI  CI_low CI_high     t df_error         p Effects Group     
  <chr>                      <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>    <int>     <dbl> <chr>   <chr>     
1 (Intercept)             251.     6.82   0.95 238.    265.    36.8       174  4.37e-84 fixed   ""        
2 Days                     10.5    1.55   0.95   7.42   13.5    6.77      174  1.88e-10 fixed   ""        
3 SD (Intercept)           24.7    5.84   0.95  15.6    39.3   NA          NA NA        random  "Subject" 
4 SD (Days)                 5.92   1.25   0.95   3.92    8.95  NA          NA NA        random  "Subject" 
5 Cor (Intercept~Days)      0.0656 0.319  0.95  -0.509   0.600 NA          NA NA        random  "Subject" 
6 SD (Observations)        25.6    1.51   0.95  22.8    28.7   NA          NA NA        random  "Residual"

Instead, you could used broom.helpers::tidy_parameters() who proceed to name standardisation. cf. https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/custom_tidiers.R#L20

> tidy_parameters(m2)
                  term     estimate std.error conf.level    conf.low   conf.high statistic df.error      p.value effect    group
1          (Intercept) 251.40510485 6.8245967       0.95 237.9354568 264.8747529 36.838090      174 4.372791e-84  fixed         
2                 Days  10.46728596 1.5457896       0.95   7.4163742  13.5181977  6.771481      174 1.882719e-10  fixed         
3       SD (Intercept)  24.74065799 5.8362639       0.95  15.5816976  39.2832779        NA       NA           NA random  Subject
4            SD (Days)   5.92213766 1.2480355       0.95   3.9182819   8.9507889        NA       NA           NA random  Subject
5 Cor (Intercept~Days)   0.06555124 0.3185881       0.95  -0.5090677   0.5997529        NA       NA           NA random  Subject
6    SD (Observations)  25.59179572 1.5080110       0.95  22.8004400  28.7248846        NA       NA           NA random Residual

Three comments. (1) for random effects, term names are not consistent with tidy() and it would require to be taken into account if we plan to improve variable labelling; (ii) the effect column is equal to "random" instead of "ran_pars". It is not ney taken into account and should be considered in https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/tidy_identify_variables.R#L90 (3) for fixed terms, group contains an empty character instead of NA, resulting in the current error when trying to use tidy_parameters() with ybl_regression(). change to be fixed at https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/tidy_disambiguate_terms.R#L56

In summary there is room for improvement regarding labelling random effects in broom.helpers (but expected results should be clarified); some bugs should be fixed in broom.helpers for allowing the use of tidy_parameters(); and we could better harmonized default tidier between gtsummary and broom.helpers for mixed models.

larmarange commented 10 months ago

Regarding taking into account groups of terms in tbl_regression()

Regarding the initial question about merging, it is part of a more global discussion on how to deal with groups of terms with tbl_regression(), cf. #1540

So far, these terms groups are not handled by tbl_regression(), except for multinomial models, where they are listed in a long form. There is no native support for a wide tabulation, although an experimental multinom_pivot_wider() has been proposed, cf. #1552

To be more generic, we could consider supporting "terms groups" or "components" (term to be discussed) within tbl_regression():

strengejacke commented 10 months ago

Thanks for the answers! Just a quick comment, will read more thoroughly later:

Regarding model_parameters, we cannot use directly model_parameters() because the column names are inconsistent with tidy()

Using insight::standardize_names(<data frame>, type = "broom") will rename columns in a data frame to follow broom-conventions (and vice versa, insight::standardize_names(<data frame>, type = "easystats") will rename into easystats-conventions).

larmarange commented 10 months ago

Using insight::standardize_names(<data frame>, type = "broom") will rename columns in a data frame to follow broom-conventions (and vice versa, insight::standardize_names(<data frame>, type = "easystats") will rename into easystats-conventions).

This is exactly what tidy_parameters() does. ;-)

strengejacke commented 10 months ago

We have rather simple wrappers around the gt package to print tables to HTML, but I like the flexibility and the particular "grammar" design from gtsummary, that's why I'm asking. Not sure if this helps you to give some "inspiration", but this is what parameters currently does:

library(glmmTMB)
library(parameters)
model1 <- glmmTMB(
  count ~ spp + mined + cover + (1 + cover | site),
  ziformula = ~mined + (1 | site),
  family = poisson(),
  data = Salamanders
)
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; singular convergence (7). See vignette('troubleshooting'),
#> help('diagnose')
model_parameters(model1, effects = "all", component = "all") |> print_html()
Model Summary
Parameter Coefficient SE 95% CI z p
Fixed Effects (Count Model)
(Intercept) -0.06 0.20 (-0.44, 0.33) -0.28 0.776
spp (PR) -1.28 0.24 (-1.74, -0.81) -5.36 < .001
spp (DM) 0.27 0.14 (2.26e-04, 0.54) 1.96 0.050
spp (EC-A) -0.61 0.20 (-1.01, -0.22) -3.09 0.002
spp (EC-L) 0.66 0.13 (0.41, 0.91) 5.12 < .001
spp (DES-L) 0.62 0.13 (0.37, 0.87) 4.93 < .001
spp (DF) 0.07 0.15 (-0.22, 0.36) 0.49 0.621
mined (no) 1.08 0.18 (0.72, 1.44) 5.91 < .001
cover -0.15 0.09 (-0.32, 0.02) -1.71 0.087
Fixed Effects (Zero-Inflation Component)
(Intercept) 1.23 0.35 (0.55, 1.91) 3.54 < .001
mined (no) -2.48 0.47 (-3.40, -1.55) -5.22 < .001
Random Effects Variances
SD (Intercept: site) 0.11
SD (cover: site) 0.20
Cor (Intercept~cover: site) -1.00
Random Effects (Zero-Inflation Component)
SD (Intercept: site) 0.85
model1 <- glmmTMB(
  count ~ spp + mined + cover + (1 + cover | site),
  ziformula = ~mined + (1 | site),
  family = poisson(),
  data = Salamanders
)
model2 <- glmmTMB(
  count ~ spp + mined + DOP + (1 | site),
  ziformula = ~ mined + DOP + (1 + DOP | site),
  family = poisson(),
  data = Salamanders
)
mp1 <- model_parameters(model1, effects = "all", component = "all", ci_random = 0.95)
mp2 <- model_parameters(model2, effects = "all", component = "all", ci_random = 0.95)
compare_parameters(list(mp1, mp2)) |> print_html()
Parameter Model 1 Model 2
Fixed Effects

(Intercept)

-0.06
(-0.44, 0.33)

-0.16
(-0.59, 0.28)

spp (PR)

-1.28
(-1.74, -0.81)

-1.26
(-1.73, -0.79)

spp (DM)

0.27
(2.26e-04, 0.54)

0.27
(2.53e-04, 0.54)

spp (EC-A)

-0.61
(-1.01, -0.22)

-0.60
(-0.99, -0.20)

spp (EC-L)

0.66
(0.41, 0.91)

0.67
(0.41, 0.92)

spp (DES-L)

0.62
(0.37, 0.87)

0.62
(0.38, 0.87)

spp (DF)

0.07
(-0.22, 0.36)

0.08
(-0.21, 0.37)

mined (no)

1.08
(0.72, 1.44)

1.07
(0.66, 1.48)

cover

-0.15
(-0.32, 0.02)

DOP

0.05
(-0.03, 0.14)

Fixed Effects (Zero-Inflation Component)

(Intercept)

1.23
(0.55, 1.91)

1.10
(0.37, 1.83)

minedno

-2.48
(-3.40, -1.55)

-2.30
(-3.32, -1.29)

DOP

0.20
(-0.09, 0.49)

Random Effects

SD (Intercept: site)

0.11
(0.03, 0.38)

0.27
(0.16, 0.46)

Cor (Intercept~cover: site)

-1.00
(-1.00, 1.00)

SD (cover: site)

0.20
(0.10, 0.43)

Random Effects (Zero-Inflation Component)

SD (Intercept: site)

0.85
(0.52, 1.37)

0.82
(0.50, 1.35)

SD (DOP: site)

0.11
(4.88e-03, 2.32)

Cor (Intercept~DOP: site)

-1.00
(-1.00, 1.00)

Created on 2023-11-07 with reprex v2.0.2

strengejacke commented 10 months ago

Three comments. (1) for random effects, term names are not consistent with tidy() and it would require to be taken into account if we plan to improve variable labelling; (ii) the effect column is equal to "random" instead of "ran_pars". It is not ney taken into account and should be considered in

Yes, I didn't mean to use parameters or label names, this was just an example how labels could look like. The original term names returned by tidy() can be preserved, maybe just have a function that "cleans" labels during printing?

larmarange commented 10 months ago

Quick note: https://github.com/larmarange/broom.helpers/pull/239 fixes the bug with tidy_parameters() and mixed models.

larmarange commented 10 months ago

Regarding the question of variable labels for random effects: a big feature of tbl_regression() and tidy_plus_plus() for classic models is the identification of the variables, the type of contrasts and reference level for categorical variables, the identification of interactions and the generation of proper and readable labels.

The same level of work and generation of readable labels is not done yet for the random effects. This is an open question on how to better such random effects and generate proper labels.

ddsjoberg commented 10 months ago

This is great! @strengejacke the tables are gorgeous.