easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
392 stars 39 forks source link

format_table() should combine Group and Parameter columns like print_*() #406

Open bwiernik opened 3 years ago

bwiernik commented 3 years ago

One step in the prettying process that seems to be skipped by format_table() is the combining of the Parameter and Group columns. This is done in the print_*() functions, but not in format_table(). The Group column could be kept (a user could drop it manually if desired) or dropped. Keeping it might be useful for users wanting to filter or split tables.

strengejacke commented 3 years ago

Do you have an example?

bwiernik commented 3 years ago
lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars) |> 
    parameters::parameters() |> 
    insight::format_table()
#> boundary (singular) fit: see ?isSingular
#>           Parameter Coefficient   SE         95% CI t(28)      p Effects
#> 1       (Intercept)       20.50 3.38 [13.57, 27.42]  6.07 < .001   fixed
#> 2    SD (Intercept)        5.76                                   random
#> 3    SD (Intercept)        0.00                                   random
#> 4 SD (Observations)        3.22                                   random
#>      Group
#> 1         
#> 2      cyl
#> 3     gear
#> 4 Residual

Created on 2021-07-26 by the reprex package (v2.0.0)

Should be:

lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars) |> 
    parameters::parameters() |> 
    insight::format_table()
#> boundary (singular) fit: see ?isSingular
#>                 Parameter Coefficient   SE         95% CI t(28)      p Effects
#> 1             (Intercept)       20.50 3.38 [13.57, 27.42]  6.07 < .001   fixed
#> 2     SD (Intercept: cyl)        5.76                                   random
#> 3    SD (Intercept: gear)        0.00                                   random
#> 4       SD (Observations)        3.22                                   random
#>      Group
#> 1         
#> 2      cyl
#> 3     gear
#> 4 Residual

Created on 2021-07-26 by the reprex package (v2.0.0)

strengejacke commented 3 years ago

Ok, I see. However, there's a reason for this output, mainly the "compatibility" to broom.mixed, where the terms also have the same names. I think this is something people rely on when they switch from broom to parameters, maybe @vincentarelbundock can say something in this regard?

lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars) |> 
  broom.mixed::tidy()
#> boundary (singular) fit: see ?isSingular
#> # A tibble: 4 x 6
#>   effect   group    term            estimate std.error statistic
#>   <chr>    <chr>    <chr>              <dbl>     <dbl>     <dbl>
#> 1 fixed    <NA>     (Intercept)        20.5       3.38      6.07
#> 2 ran_pars cyl      sd__(Intercept)     5.76     NA        NA   
#> 3 ran_pars gear     sd__(Intercept)     0        NA        NA   
#> 4 ran_pars Residual sd__Observation     3.22     NA        NA

Created on 2021-07-26 by the reprex package (v2.0.0)

vincentarelbundock commented 3 years ago

I can see two arguments here, with the second one being more convincing (to me).

First (and least important), in an ideal world, I think terms would be unique identifiers. Unfortunately, that’s not always possible. Here we have an example, but an even clearer one is multinomial logit, where all terms are repeated for all levels of the outcome.

From the perspective of modelsummary, one minor problem is that the default table looks terrible when there are duplicate terms (but take note of the useful warning):

library(modelsummary)

mod <- lme4::lmer(mpg ~ 1 + (1 | cyl) + (1 | gear), data = mtcars)
#> boundary (singular) fit: see ?isSingular

modelsummary(mod, gof_omit = ".*")

#> Warning in modelsummary(mod, gof_omit = ".*"): The table includes duplicate
#> term names. This can happen when `coef_map` or `coef_rename` are misused. This
#> can also happen when a model produces "grouped" terms, such as in multinomial
#> logit or gamlss models. You may want to call `get_estimates(model)` to see
#> how estimates are labelled internally, and use the `group` argument of the
#> `modelsummary` function.
Model 1
(Intercept) 20.495
(3.379)
SD (Intercept) 0.000
5.761
SD (Observations) 1.795

The good news is that, as the warning indicated, we can use modelsummary’s group argument to get a perfectly fine table here:

modelsummary(mod, group = term + group ~ model, gof_omit = ".*")
Model 1
(Intercept) 20.495
(3.379)
SD (Intercept) cyl 5.761
gear 0.000
SD (Observations) Residual 1.795

This brings me to the second (and best) argument: In a package that is mostly designed to extract raw data to make it accessible programmatically, I feel it would be a bad idea to combine meta data in a single column. The term and group columns represent different “kinds” of information, so they should stay separate. I can see a good argument for combining them in a print method, but not in the extraction function.

At least for my package, combining the columns would cause some minor headaches because the group argument strategy shown above would no longer work nicely.

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.0 (2021-05-18) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_CA.UTF-8 #> ctype en_CA.UTF-8 #> tz America/Toronto #> date 2021-07-26 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] standard (@0.2.1) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0) #> bayestestR 0.10.0 2021-05-31 [1] CRAN (R 4.1.0) #> boot 1.3-28 2021-05-03 [1] CRAN (R 4.1.0) #> broom 0.7.8 2021-06-24 [1] CRAN (R 4.1.0) #> checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.1.0) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0) #> colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0) #> DBI 1.1.1 2021-01-15 [1] standard (@1.1.1) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0) #> insight 0.14.2 2021-06-22 [1] CRAN (R 4.1.0) #> kableExtra 1.3.4 2021-02-20 [1] CRAN (R 4.1.0) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0) #> languageserver * 0.3.10 2021-04-20 [1] CRAN (R 4.1.0) #> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.0) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0) #> lme4 1.1-27.1 2021-06-22 [1] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.0) #> Matrix 1.3-3 2021-05-04 [1] CRAN (R 4.1.0) #> minqa 1.2.4 2014-10-09 [1] CRAN (R 4.1.0) #> modelsummary * 0.8.1.9000 2021-07-25 [1] local #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> nlme 3.1-152 2021-02-04 [1] CRAN (R 4.1.0) #> nloptr 1.2.2.2 2020-07-02 [1] CRAN (R 4.1.0) #> parameters 0.14.0 2021-05-29 [1] CRAN (R 4.1.0) #> performance 0.7.3 2021-07-21 [1] CRAN (R 4.1.0) #> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.1.0) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0) #> reprex 2.0.0 2021-04-02 [1] standard (@2.0.0) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0) #> rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> rvest 1.0.0 2021-03-09 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> stringi 1.7.3 2021-07-16 [1] CRAN (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.1.0) #> svglite 2.0.0 2021-02-20 [1] CRAN (R 4.1.0) #> systemfonts 1.0.2 2021-05-11 [1] CRAN (R 4.1.0) #> tables 0.9.6 2020-09-22 [1] CRAN (R 4.1.0) #> tibble 3.1.3 2021-07-23 [1] CRAN (R 4.1.0) #> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.1.0) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0) #> webshot 0.5.2 2019-11-22 [1] CRAN (R 4.1.0) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) #> xfun 0.24 2021-06-15 [1] CRAN (R 4.1.0) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```
bwiernik commented 3 years ago

in insight's case, it both extracts and tables data and formats them for display and rendering. format_table(), print_html(), and print_md() are functions for the latter.

So long as there is a way to get combined columns like we show in print_html() in a data frame format (what format_table()) produces for use with other tabling packages or purposes, I don't really have a preference for one way or another. How about another function to do the consolidation? That function could drop the Group column then

vincentarelbundock commented 3 years ago

And in typical fashion, I write a novel-length answer completely missing the point ;)

Thanks for the correction!

strengejacke commented 3 years ago

I think I also missed the point. modelsummary relies on the data frame returned by parameters::model_parameters(), and not the formatted version from format_table() - so indeed, we could have an option (or by default) to merge those two columns. Yet, this is not pretty straightforward to implement, as we have different model objects that can have group columns, that all need to be handled in a different way.

bwiernik commented 3 years ago

How is it done in print_html() or print() ?

strengejacke commented 3 years ago

Step 1:

https://github.com/easystats/parameters/blob/cd6efa97c980f54883684971092edeea08981205/R/format.R#L65-L76

Step 2:

https://github.com/easystats/parameters/blob/cd6efa97c980f54883684971092edeea08981205/R/format.R#L98-L101

And insight::format_table() is called somewhere between these two steps.

Might be easier than expected, since we store an attribute if the parameters-data frame includes random parameters from mixed models (see step 1)