datalorax / equatiomatic

Convert models to LaTeX equations
https://datalorax.github.io/equatiomatic/
Creative Commons Attribution 4.0 International
615 stars 43 forks source link

replacing `{broom}` and `{broom.mixed}` tidiers with `{parameters}` package to reduce no. of dependencies #152

Open IndrajeetPatil opened 3 years ago

IndrajeetPatil commented 3 years ago

Before making a PR related to this, I was wondering if you would be open to this. If you agree, I will open a PR.

rationale

parameters (https://easystats.github.io/parameters/) has way fewer dependencies and can handle pretty much every model that broom and broom.mixed combined support. It offers a number of other additional features not in broom (e.g., robust SEs, standardization, etc.)

dependency calculations

tools::package_dependencies(c("broom", "broom.mixed", "parameters"), recursive = TRUE)
#> $broom
#>  [1] "backports"    "dplyr"        "ellipsis"     "generics"     "glue"        
#>  [6] "methods"      "purrr"        "rlang"        "stringr"      "tibble"      
#> [11] "tidyr"        "ggplot2"      "lifecycle"    "magrittr"     "R6"          
#> [16] "tidyselect"   "utils"        "vctrs"        "pillar"       "digest"      
#> [21] "grDevices"    "grid"         "gtable"       "isoband"      "MASS"        
#> [26] "mgcv"         "scales"       "stats"        "withr"        "stringi"     
#> [31] "fansi"        "pkgconfig"    "cpp11"        "graphics"     "nlme"        
#> [36] "Matrix"       "splines"      "cli"          "crayon"       "utf8"        
#> [41] "farver"       "labeling"     "munsell"      "RColorBrewer" "viridisLite" 
#> [46] "tools"        "lattice"      "colorspace"  
#> 
#> $broom.mixed
#>  [1] "broom"        "coda"         "dplyr"        "methods"      "nlme"        
#>  [6] "purrr"        "stringr"      "tibble"       "tidyr"        "backports"   
#> [11] "ellipsis"     "generics"     "glue"         "rlang"        "ggplot2"     
#> [16] "lattice"      "lifecycle"    "magrittr"     "R6"           "tidyselect"  
#> [21] "utils"        "vctrs"        "pillar"       "graphics"     "stats"       
#> [26] "stringi"      "fansi"        "pkgconfig"    "cpp11"        "grDevices"   
#> [31] "digest"       "grid"         "gtable"       "isoband"      "MASS"        
#> [36] "mgcv"         "scales"       "withr"        "cli"          "crayon"      
#> [41] "utf8"         "tools"        "Matrix"       "splines"      "farver"      
#> [46] "labeling"     "munsell"      "RColorBrewer" "viridisLite"  "colorspace"  
#> 
#> $parameters
#> [1] "bayestestR" "datawizard" "insight"    "graphics"   "methods"   
#> [6] "stats"      "utils"

Created on 2021-11-03 by the reprex package (v2.0.1)

example with merMod

library(lme4)
#> Loading required package: Matrix
library(magrittr)
library(parameters)

lmer_mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

broom.mixed::tidy(lmer_mod, effects = "fixed")
#> # A tibble: 2 x 5
#>   effect term        estimate std.error statistic
#>   <chr>  <chr>          <dbl>     <dbl>     <dbl>
#> 1 fixed  (Intercept)    251.       6.82     36.8 
#> 2 fixed  Days            10.5      1.55      6.77

parameters::standardize_names(parameters::model_parameters(lmer_mod), style = "broom") %>%
  tibble::as_tibble()
#> # A tibble: 2 x 9
#>   term  estimate std.error conf.level conf.low conf.high statistic df.error
#>   <chr>    <dbl>     <dbl>      <dbl>    <dbl>     <dbl>     <dbl>    <int>
#> 1 (Int…    251.       6.82       0.95   238.       265.      36.8       174
#> 2 Days      10.5      1.55       0.95     7.44      13.5      6.77      174
#> # … with 1 more variable: p.value <dbl>

example with lm

lm_mod <- lm(Reaction ~ Days, sleepstudy)

broom::tidy(lm_mod)
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)    251.       6.61     38.0  2.16e-87
#> 2 Days            10.5      1.24      8.45 9.89e-15

parameters::standardize_names(parameters::model_parameters(lm_mod), style = "broom") %>%
  tibble::as_tibble()
#> # A tibble: 2 x 9
#>   term  estimate std.error conf.level conf.low conf.high statistic df.error
#>   <chr>    <dbl>     <dbl>      <dbl>    <dbl>     <dbl>     <dbl>    <int>
#> 1 (Int…    251.       6.61       0.95   238.       264.      38.0       178
#> 2 Days      10.5      1.24       0.95     8.02      12.9      8.45      178
#> # … with 1 more variable: p.value <dbl>

Created on 2021-02-18 by the reprex package (v1.0.0)

datalorax commented 3 years ago

I like the general idea but this would be a massive change and I'm not sure it's worth it. A lot of the current codebase depends on the output from broom looking exactly as it does now, so it would require considerable refactoring. For example, the lme4::lmer() code depends on having the effect column to delineate between fixed and random effects.

The other thing that worries me a little bit is just that broom is a really established package with considerable support around maintaining it. I've never really looked into parameters. It looks like it's pretty well maintained too. But it would still worry me a bit.

So I guess I'm leaning toward no thanks, but I'm happy to engage in the conversation a bit more.

IndrajeetPatil commented 3 years ago

For example, the lme4::lmer() code depends on having the effect column to delineate between fixed and random effects.

Hmm, that's a fair point. This is indeed a context where the parameters output won't exactly line up with the broom.mixed output, and this is a good enough reason to currently not make this switch.

The other thing that worries me a little bit is just that broom is a really established package with considerable support around maintaining it.

As someone who has contributed to both of these packages, I can vouch for the rigor and speed at which parameters is maintained (it is < 2 years old and already supports more models than broom and broom.mixed combined) and, in a few years, it will be as well-established as broom was at its age. 😉

So I guess I'm leaning toward no thanks, but I'm happy to engage in the conversation a bit more.

We can revisit this when parameters starts to behave the same way as broom.mixed when it comes to random effects. Since then the switch would require minimal refactoring.

datalorax commented 3 years ago

Sounds good to me. Thanks.

IndrajeetPatil commented 3 years ago

The outputs for mixed-effects models from parameters (GitHub version) now also line up with broom.mixed output, with a few differences in naming schemas for terms, but that should be easy to adjust to.

No pressure at all to take this further; just wanted to log where things start right now. 🙂

library(lme4)
library(broom.mixed)
library(tibble)
library(parameters)

options(tibble.width = Inf)

mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

# `broom.mixed` output --------------------------------

tidy(mod)
#> # A tibble: 6 x 6
#>   effect   group    term                  estimate std.error statistic
#>   <chr>    <chr>    <chr>                    <dbl>     <dbl>     <dbl>
#> 1 fixed    <NA>     (Intercept)           251.          6.82     36.8 
#> 2 fixed    <NA>     Days                   10.5         1.55      6.77
#> 3 ran_pars Subject  sd__(Intercept)        24.7        NA        NA   
#> 4 ran_pars Subject  cor__(Intercept).Days   0.0656     NA        NA   
#> 5 ran_pars Subject  sd__Days                5.92       NA        NA   
#> 6 ran_pars Residual sd__Observation        25.6        NA        NA

# `parameters` output ---------------------------------
# (with further modications to match `broom` conventions)

model_parameters(mod, effects = "all") %>%
  standardize_names(style = "broom") %>%
  as_tibble()
#> # A tibble: 6 x 11
#>   term                 estimate std.error conf.level conf.low conf.high
#>   <chr>                   <dbl>     <dbl>      <dbl>    <dbl>     <dbl>
#> 1 (Intercept)           251.         6.82       0.95   238.       265. 
#> 2 Days                   10.5        1.55       0.95     7.44      13.5
#> 3 SD (Observations)      25.6       NA          0.95    NA         NA  
#> 4 SD (Intercept)         24.7       NA          0.95    NA         NA  
#> 5 SD (Days)               5.92      NA          0.95    NA         NA  
#> 6 Cor (Intercept~Days)    0.256     NA          0.95    NA         NA  
#>   statistic df.error    p.value effect group     
#>       <dbl>    <int>      <dbl> <chr>  <chr>     
#> 1     36.8       174  4.54e-297 fixed  ""        
#> 2      6.77      174  1.27e- 11 fixed  ""        
#> 3     NA          NA NA         random "Residual"
#> 4     NA          NA NA         random "Subject" 
#> 5     NA          NA NA         random "Subject" 
#> 6     NA          NA NA         random "Subject"

Created on 2021-03-05 by the reprex package (v1.0.0)

datalorax commented 3 years ago

Okay, I appreciate it. I'm hoping to come back to work on some bugs and things here in the next couple weeks. I suppose we could use the GitHub version as a dependency for now and then wait until they push to CRAN before our next release.

IndrajeetPatil commented 2 years ago

Just wanted to post another reprex, this time with CRAN versions of both packages.

As far as I can see, there are just two (IMO) minor differences, but not sure how much difference it makes to your code:

library(lme4)
#> Loading required package: Matrix
library(broom.mixed)
library(tibble)
library(parameters)

options(tibble.width = Inf)

mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

# `broom.mixed` output --------------------------------

tidy(mod)
#> # A tibble: 6 x 6
#>   effect   group    term                  estimate std.error statistic
#>   <chr>    <chr>    <chr>                    <dbl>     <dbl>     <dbl>
#> 1 fixed    <NA>     (Intercept)           251.          6.82     36.8 
#> 2 fixed    <NA>     Days                   10.5         1.55      6.77
#> 3 ran_pars Subject  sd__(Intercept)        24.7        NA        NA   
#> 4 ran_pars Subject  cor__(Intercept).Days   0.0656     NA        NA   
#> 5 ran_pars Subject  sd__Days                5.92       NA        NA   
#> 6 ran_pars Residual sd__Observation        25.6        NA        NA

# `parameters` output ---------------------------------
# (with further modications to match `broom` conventions)

model_parameters(mod, effects = "all") %>%
  standardize_names(style = "broom") %>%
  as_tibble()
#> # A tibble: 6 x 11
#>   term                          estimate std.error conf.level conf.low conf.high
#>   <chr>                            <dbl>     <dbl>      <dbl>    <dbl>     <dbl>
#> 1 (Intercept)                   251.          6.82       0.95   238.       265. 
#> 2 Days                           10.5         1.55       0.95     7.42      13.5
#> 3 SD (Intercept)                 24.7        NA          0.95    NA         NA  
#> 4 SD (Days)                       5.92       NA          0.95    NA         NA  
#> 5 Cor (Intercept~Days: Subject)   0.0656     NA          0.95    NA         NA  
#> 6 SD (Observations)              25.6        NA          0.95    NA         NA  
#>   statistic df.error   p.value effect group     
#>       <dbl>    <int>     <dbl> <chr>  <chr>     
#> 1     36.8       174  4.37e-84 fixed  ""        
#> 2      6.77      174  1.88e-10 fixed  ""        
#> 3     NA          NA NA        random "Subject" 
#> 4     NA          NA NA        random "Subject" 
#> 5     NA          NA NA        random "Subject" 
#> 6     NA          NA NA        random "Residual"

Created on 2021-11-03 by the reprex package (v2.0.1)

datalorax commented 2 years ago

Thanks. Just to be clear, the parameters package handles the models that broom and broom.mixed handle, correct?

IndrajeetPatil commented 2 years ago

Yes, you can see the list of supported models using this function:

insight::supported_models()
#>   [1] "aareg"             "afex_aov"          "AKP"              
#>   [4] "Anova.mlm"         "aov"               "aovlist"          
#>   [7] "Arima"             "averaging"         "bamlss"           
#>  [10] "bamlss.frame"      "bayesQR"           "bayesx"           
#>  [13] "BBmm"              "BBreg"             "bcplm"            
#>  [16] "betamfx"           "betaor"            "betareg"          
#>  [19] "BFBayesFactor"     "bfsl"              "BGGM"             
#>  [22] "bife"              "bifeAPEs"          "bigglm"           
#>  [25] "biglm"             "blavaan"           "blrm"             
#>  [28] "bracl"             "brglm"             "brmsfit"          
#>  [31] "brmultinom"        "btergm"            "censReg"          
#>  [34] "cgam"              "cgamm"             "cglm"             
#>  [37] "clm"               "clm2"              "clmm"             
#>  [40] "clmm2"             "clogit"            "coeftest"         
#>  [43] "complmrob"         "confusionMatrix"   "coxme"            
#>  [46] "coxph"             "coxph.penal"       "coxr"             
#>  [49] "cpglm"             "cpglmm"            "crch"             
#>  [52] "crq"               "crqs"              "crr"              
#>  [55] "dep.effect"        "DirichletRegModel" "drc"              
#>  [58] "eglm"              "elm"               "epi.2by2"         
#>  [61] "ergm"              "feglm"             "feis"             
#>  [64] "felm"              "fitdistr"          "fixest"           
#>  [67] "flexsurvreg"       "gam"               "Gam"              
#>  [70] "gamlss"            "gamm"              "gamm4"            
#>  [73] "garch"             "gbm"               "gee"              
#>  [76] "geeglm"            "glht"              "glimML"           
#>  [79] "glm"               "Glm"               "glmm"             
#>  [82] "glmmadmb"          "glmmPQL"           "glmmTMB"          
#>  [85] "glmrob"            "glmRob"            "glmx"             
#>  [88] "gls"               "gmnl"              "HLfit"            
#>  [91] "htest"             "hurdle"            "iv_robust"        
#>  [94] "ivFixed"           "ivprobit"          "ivreg"            
#>  [97] "lavaan"            "lm"                "lm_robust"        
#> [100] "lme"               "lmerMod"           "lmerModLmerTest"  
#> [103] "lmodel2"           "lmrob"             "lmRob"            
#> [106] "logistf"           "logitmfx"          "logitor"          
#> [109] "LORgee"            "lqm"               "lqmm"             
#> [112] "lrm"               "manova"            "MANOVA"           
#> [115] "margins"           "maxLik"            "mclogit"          
#> [118] "mcmc"              "mcmc.list"         "MCMCglmm"         
#> [121] "mcp1"              "mcp12"             "mcp2"             
#> [124] "med1way"           "mediate"           "merMod"           
#> [127] "merModList"        "meta_bma"          "meta_fixed"       
#> [130] "meta_random"       "metaplus"          "mhurdle"          
#> [133] "mipo"              "mira"              "mixed"            
#> [136] "MixMod"            "mixor"             "mjoint"           
#> [139] "mle"               "mle2"              "mlm"              
#> [142] "mlogit"            "mmlogit"           "model_fit"        
#> [145] "multinom"          "mvord"             "negbinirr"        
#> [148] "negbinmfx"         "ols"               "onesampb"         
#> [151] "orm"               "pgmm"              "plm"              
#> [154] "PMCMR"             "poissonirr"        "poissonmfx"       
#> [157] "polr"              "probitmfx"         "psm"              
#> [160] "Rchoice"           "ridgelm"           "riskRegression"   
#> [163] "rjags"             "rlm"               "rlmerMod"         
#> [166] "RM"                "rma"               "rma.uni"          
#> [169] "robmixglm"         "robtab"            "rq"               
#> [172] "rqs"               "rqss"              "Sarlm"            
#> [175] "scam"              "selection"         "sem"              
#> [178] "SemiParBIV"        "semLm"             "semLme"           
#> [181] "slm"               "speedglm"          "speedlm"          
#> [184] "stanfit"           "stanmvreg"         "stanreg"          
#> [187] "summary.lm"        "survfit"           "survreg"          
#> [190] "svy_vglm"          "svyglm"            "svyolr"           
#> [193] "t1way"             "tobit"             "trimcibt"         
#> [196] "truncreg"          "vgam"              "vglm"             
#> [199] "wbgee"             "wblm"              "wbm"              
#> [202] "wmcpAKP"           "yuen"              "yuend"            
#> [205] "zcpglm"            "zeroinfl"          "zerotrunc"

Created on 2021-11-03 by the reprex package (v2.0.1)

datalorax commented 2 years ago

Thanks, I'll play around with this in a bit.

IndrajeetPatil commented 2 years ago

Cool!

The documentation can be found here: https://easystats.github.io/parameters/

strengejacke commented 2 years ago

group column strings are surrounded in ""

Only in the printed output. That's because parameters uses an empty string in "group" for fixed effects, while broom.mixed uses NA. And for character columns, including empty strings, tibble adds a surrounding ".

McCartneyAC commented 1 year ago

I would find this helpful--easystats is quickly becoming a huge part of my workflow and it would open up a huge number of classes to switch to {parameters} instead.