Open bwiernik opened 3 years ago
Do you have an example?
lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars) |>
parameters::parameters() |>
insight::format_table()
#> boundary (singular) fit: see ?isSingular
#> Parameter Coefficient SE 95% CI t(28) p Effects
#> 1 (Intercept) 20.50 3.38 [13.57, 27.42] 6.07 < .001 fixed
#> 2 SD (Intercept) 5.76 random
#> 3 SD (Intercept) 0.00 random
#> 4 SD (Observations) 3.22 random
#> Group
#> 1
#> 2 cyl
#> 3 gear
#> 4 Residual
Created on 2021-07-26 by the reprex package (v2.0.0)
Should be:
lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars) |>
parameters::parameters() |>
insight::format_table()
#> boundary (singular) fit: see ?isSingular
#> Parameter Coefficient SE 95% CI t(28) p Effects
#> 1 (Intercept) 20.50 3.38 [13.57, 27.42] 6.07 < .001 fixed
#> 2 SD (Intercept: cyl) 5.76 random
#> 3 SD (Intercept: gear) 0.00 random
#> 4 SD (Observations) 3.22 random
#> Group
#> 1
#> 2 cyl
#> 3 gear
#> 4 Residual
Created on 2021-07-26 by the reprex package (v2.0.0)
Ok, I see. However, there's a reason for this output, mainly the "compatibility" to broom.mixed, where the terms
also have the same names. I think this is something people rely on when they switch from broom to parameters, maybe @vincentarelbundock can say something in this regard?
lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars) |>
broom.mixed::tidy()
#> boundary (singular) fit: see ?isSingular
#> # A tibble: 4 x 6
#> effect group term estimate std.error statistic
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 fixed <NA> (Intercept) 20.5 3.38 6.07
#> 2 ran_pars cyl sd__(Intercept) 5.76 NA NA
#> 3 ran_pars gear sd__(Intercept) 0 NA NA
#> 4 ran_pars Residual sd__Observation 3.22 NA NA
Created on 2021-07-26 by the reprex package (v2.0.0)
I can see two arguments here, with the second one being more convincing (to me).
First (and least important), in an ideal world, I think terms would be unique identifiers. Unfortunately, that’s not always possible. Here we have an example, but an even clearer one is multinomial logit, where all terms are repeated for all levels of the outcome.
From the perspective of modelsummary
, one minor problem is that the default table looks terrible when there are duplicate terms (but take note of the useful warning):
library(modelsummary)
mod <- lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars)
#> boundary (singular) fit: see ?isSingular
modelsummary(mod, gof_omit = ".*")
#> Warning in modelsummary(mod, gof_omit = ".*"): The table includes duplicate
#> term names. This can happen when `coef_map` or `coef_rename` are misused. This
#> can also happen when a model produces "grouped" terms, such as in multinomial
#> logit or gamlss models. You may want to call `get_estimates(model)` to see
#> how estimates are labelled internally, and use the `group` argument of the
#> `modelsummary` function.
Model 1 | |
---|---|
(Intercept) | 20.495 |
(3.379) | |
SD (Intercept) | 0.000 |
5.761 | |
SD (Observations) | 1.795 |
The good news is that, as the warning indicated, we can use modelsummary
’s group argument to get a perfectly fine table here:
modelsummary(mod, group = term + group ~ model, gof_omit = ".*")
Model 1 | ||
---|---|---|
(Intercept) | 20.495 | |
(3.379) | ||
SD (Intercept) | cyl | 5.761 |
gear | 0.000 | |
SD (Observations) | Residual | 1.795 |
This brings me to the second (and best) argument: In a package that is mostly designed to extract raw data to make it accessible programmatically, I feel it would be a bad idea to combine meta data in a single column. The term
and group
columns represent different “kinds” of information, so they should stay separate. I can see a good argument for combining them in a print
method, but not in the extraction function.
At least for my package, combining the columns would cause some minor headaches because the group
argument strategy shown above would no longer work nicely.
in insight's case, it both extracts and tables data and formats them for display and rendering. format_table(), print_html(), and print_md() are functions for the latter.
So long as there is a way to get combined columns like we show in print_html() in a data frame format (what format_table()) produces for use with other tabling packages or purposes, I don't really have a preference for one way or another. How about another function to do the consolidation? That function could drop the Group column then
And in typical fashion, I write a novel-length answer completely missing the point ;)
Thanks for the correction!
I think I also missed the point. modelsummary relies on the data frame returned by parameters::model_parameters()
, and not the formatted version from format_table()
- so indeed, we could have an option (or by default) to merge those two columns. Yet, this is not pretty straightforward to implement, as we have different model objects that can have group columns, that all need to be handled in a different way.
How is it done in print_html() or print() ?
Step 1:
Step 2:
And insight::format_table()
is called somewhere between these two steps.
Might be easier than expected, since we store an attribute if the parameters-data frame includes random parameters from mixed models (see step 1)
One step in the prettying process that seems to be skipped by
format_table()
is the combining of theParameter
andGroup
columns. This is done in theprint_*()
functions, but not informat_table()
. The Group column could be kept (a user could drop it manually if desired) or dropped. Keeping it might be useful for users wanting to filter or split tables.