easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
427 stars 36 forks source link

catching up with `broom` tidiers #336

Closed IndrajeetPatil closed 3 years ago

IndrajeetPatil commented 3 years ago

After switching completely to parameters, I realized that although there are a number of models parameters covers that broom + broom.mixed don't, there are also models which are in vice versa category. I just wanted to list them so that we can cover them in the future. Needless to say, this is low-priority.

(If any of these are covered already, let me know!)

strengejacke commented 3 years ago

I think some of those already work with parameters.

strengejacke commented 3 years ago

mediation::mediate() would be nice, though broom does not process these models properly:

library(mediation)
#> Loading required package: MASS
#> Loading required package: Matrix
#> Loading required package: mvtnorm
#> Loading required package: sandwich
#> mediation: Causal Mediation Analysis
#> Version: 4.5.0
data(jobs)

b <- lm(job_seek ~ treat + econ_hard + sex + age, data=jobs)
c <- lm(depress2 ~ treat + job_seek + econ_hard + sex + age, data=jobs)
m1 <- mediate(b, c, sims=50, treat="treat", mediator="job_seek")

jobs$job_disc <- as.factor(jobs$job_disc)
b.ord <- polr(job_disc ~ treat + econ_hard + sex + age, data=jobs,
              method="probit", Hess=TRUE)
d.bin <- glm(work1 ~ treat + job_disc + econ_hard + sex + age, data=jobs,
             family=binomial(link="probit"))
m2 <- mediate(b.ord, d.bin, sims=50, treat="treat", mediator="job_disc")

summary(m1)
#> 
#> Causal Mediation Analysis 
#> 
#> Quasi-Bayesian Confidence Intervals
#> 
#>                Estimate 95% CI Lower 95% CI Upper p-value  
#> ACME            -0.0182      -0.0395         0.00    0.12  
#> ADE             -0.0476      -0.1124         0.01    0.12  
#> Total Effect    -0.0658      -0.1371         0.00    0.08 .
#> Prop. Mediated   0.2842      -1.4159         0.85    0.20  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Sample Size Used: 899 
#> 
#> 
#> Simulations: 50
broom::tidy(m1)
#> # A tibble: 4 x 4
#>   term   estimate std.error p.value
#>   <chr>     <dbl>     <dbl>   <dbl>
#> 1 acme_0  -0.0182    0.0127    0.12
#> 2 acme_1  -0.0182    0.0127    0.12
#> 3 ade_0   -0.0476    0.0333    0.12
#> 4 ade_1   -0.0476    0.0333    0.12

summary(m2)
#> 
#> Causal Mediation Analysis 
#> 
#> Quasi-Bayesian Confidence Intervals
#> 
#>                          Estimate 95% CI Lower 95% CI Upper p-value  
#> ACME (control)            0.00303     -0.00571         0.01    0.52  
#> ACME (treated)            0.00327     -0.00595         0.01    0.48  
#> ADE (control)             0.04497     -0.00335         0.10    0.12  
#> ADE (treated)             0.04521     -0.00337         0.10    0.12  
#> Total Effect              0.04824      0.00113         0.10    0.04 *
#> Prop. Mediated (control)  0.06379     -0.29252         1.51    0.56  
#> Prop. Mediated (treated)  0.07179     -0.29844         1.50    0.52  
#> ACME (average)            0.00315     -0.00583         0.01    0.48  
#> ADE (average)             0.04509     -0.00336         0.10    0.12  
#> Prop. Mediated (average)  0.06779     -0.29548         1.50    0.52  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Sample Size Used: 899 
#> 
#> 
#> Simulations: 50
broom::tidy(m2)
#> # A tibble: 4 x 4
#>   term   estimate std.error p.value
#>   <chr>     <dbl>     <dbl>   <dbl>
#> 1 acme_0  0.00303   0.00442    0.52
#> 2 acme_1  0.00327   0.00471    0.48
#> 3 ade_0   0.0450    0.0272     0.12
#> 4 ade_1   0.0452    0.0273     0.12

Created on 2020-11-12 by the reprex package (v0.3.0)

strengejacke commented 3 years ago

@IndrajeetPatil Maybe you can add the package names, from most of the remaining, non-checked functions, I don't know to which package these belong.

IndrajeetPatil commented 3 years ago

Done!

strengejacke commented 3 years ago

I don't know how to get this model: rstanarm::nlstanreg

IndrajeetPatil commented 3 years ago

Sorry, my bad: The object is nlstanreg, but the function is rstanarm::stan_nlmer

e.g.

# setup
set.seed(123)
library(rstanarm)
data("Orange", package = "datasets")
Orange$circumference <- Orange$circumference / 100
Orange$age <- Orange$age / 100

# model
fit <-
  rstanarm::stan_nlmer(
    formula = circumference ~ SSlogis(age, Asym, xmid, scal) ~ Asym | Tree,
    data = Orange,
    # for speed only
    chains = 1,
    iter = 1000,
    refresh = 0
  )
strengejacke commented 3 years ago
library(parameters)
library(mediation)
library(MASS)

data(jobs)
b <- lm(job_seek ~ treat + econ_hard + sex + age, data=jobs)
c <- lm(depress2 ~ treat + job_seek + econ_hard + sex + age, data=jobs)
m1 <- mediate(b, c, sims=50, treat="treat", mediator="job_seek")

b2 <- lm(job_seek ~ educ + sex, data=jobs)
c2 <- lm(depress2 ~ educ + job_seek + sex, data=jobs)
m2 <- mediate(b2, c2, treat="educ", mediator="job_seek", sims=50, control.value = "gradwk", treat.value = "somcol")

jobs$job_disc <- as.factor(jobs$job_disc)
b.ord <- polr(job_disc ~ treat + econ_hard + sex + age, data=jobs,
              method="probit", Hess=TRUE)
d.bin <- glm(work1 ~ treat + job_disc + econ_hard + sex + age, data=jobs,
             family=binomial(link="probit"))
m3 <- mediate(b.ord, d.bin, sims=50, treat="treat", mediator="job_disc")

summary(m1)
#> 
#> Causal Mediation Analysis 
#> 
#> Quasi-Bayesian Confidence Intervals
#> 
#>                Estimate 95% CI Lower 95% CI Upper p-value  
#> ACME            -0.0176      -0.0411         0.00    0.08 .
#> ADE             -0.0362      -0.1153         0.03    0.40  
#> Total Effect    -0.0538      -0.1364         0.03    0.20  
#> Prop. Mediated   0.2352      -2.3922         1.62    0.20  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Sample Size Used: 899 
#> 
#> 
#> Simulations: 50
model_parameters(m1)
#> Parameter              | Estimate |        95% CI |     p
#> ---------------------------------------------------------
#> Indirect Effect (ACME) |    -0.02 | [-0.04, 0.00] | 0.080
#> Direct Effect (ACME)   |    -0.04 | [-0.12, 0.03] | 0.400
#> Total Effect           |    -0.05 | [-0.14, 0.03] | 0.200
#> Prop. Mediated         |     0.24 | [-2.39, 1.62] | 0.200
model_parameters(m1, ci = .8)
#> Parameter              | Estimate |         80% CI |     p
#> ----------------------------------------------------------
#> Indirect Effect (ACME) |    -0.02 | [-0.03,  0.00] | 0.080
#> Direct Effect (ACME)   |    -0.04 | [-0.09,  0.03] | 0.400
#> Total Effect           |    -0.05 | [-0.10, -0.01] | 0.200
#> Prop. Mediated         |     0.24 | [-0.01,  0.80] | 0.200

summary(m2)
#> 
#> Causal Mediation Analysis 
#> 
#> Quasi-Bayesian Confidence Intervals
#> 
#>                Estimate 95% CI Lower 95% CI Upper p-value
#> ACME             0.0235      -0.0107         0.05    0.16
#> ADE             -0.0871      -0.2107         0.05    0.28
#> Total Effect    -0.0636      -0.1930         0.08    0.48
#> Prop. Mediated  -0.1103      -1.0450         2.60    0.64
#> 
#> Sample Size Used: 899 
#> 
#> 
#> Simulations: 50
model_parameters(m2)
#> Parameter              | Estimate |        95% CI |     p
#> ---------------------------------------------------------
#> Indirect Effect (ACME) |     0.02 | [-0.01, 0.05] | 0.160
#> Direct Effect (ACME)   |    -0.09 | [-0.21, 0.05] | 0.280
#> Total Effect           |    -0.06 | [-0.19, 0.08] | 0.480
#> Prop. Mediated         |    -0.11 | [-1.04, 2.60] | 0.640

summary(m3)
#> 
#> Causal Mediation Analysis 
#> 
#> Quasi-Bayesian Confidence Intervals
#> 
#>                          Estimate 95% CI Lower 95% CI Upper p-value
#> ACME (control)            0.00306     -0.00442         0.01    0.60
#> ACME (treated)            0.00329     -0.00483         0.01    0.56
#> ADE (control)             0.04579     -0.00575         0.10    0.24
#> ADE (treated)             0.04603     -0.00576         0.10    0.24
#> Total Effect              0.04909     -0.00805         0.11    0.20
#> Prop. Mediated (control)  0.03743     -0.48702         0.65    0.56
#> Prop. Mediated (treated)  0.04036     -0.47461         0.65    0.52
#> ACME (average)            0.00318     -0.00462         0.01    0.56
#> ADE (average)             0.04591     -0.00576         0.10    0.24
#> Prop. Mediated (average)  0.03890     -0.48081         0.65    0.52
#> 
#> Sample Size Used: 899 
#> 
#> 
#> Simulations: 50
model_parameters(m3)
#> # control 
#> 
#> Parameter              | Estimate |        95% CI |     p
#> ---------------------------------------------------------
#> Indirect Effect (ACME) | 1.27e-03 | [-0.01, 0.01] | 0.720
#> Direct Effect (ADE)    |     0.04 | [-0.02, 0.11] | 0.160
#> Prop. Mediated         |     0.03 | [-0.23, 0.68] | 0.680
#> 
#> # treated 
#> 
#> Parameter              | Estimate |        95% CI |     p
#> ---------------------------------------------------------
#> Indirect Effect (ACME) | 1.43e-03 | [-0.01, 0.01] | 0.680
#> Direct Effect (ADE)    |     0.04 | [-0.02, 0.11] | 0.160
#> Prop. Mediated         |     0.04 | [-0.23, 0.68] | 0.640
#> 
#> # average 
#> 
#> Parameter              | Estimate |        95% CI |     p
#> ---------------------------------------------------------
#> Indirect Effect (ACME) | 1.35e-03 | [-0.01, 0.01] | 0.680
#> Direct Effect (ADE)    |     0.04 | [-0.02, 0.11] | 0.160
#> Prop. Mediated         |     0.03 | [-0.23, 0.68] | 0.640
#> 
#> # Total Effect 
#> 
#> Parameter    | Estimate |        95% CI |     p
#> -----------------------------------------------
#> Total Effect |     0.04 | [-0.02, 0.11] | 0.120

Created on 2020-11-15 by the reprex package (v0.3.0)

vincentarelbundock commented 3 years ago

The fixest package offers fast estimation of models with fixed effects. Here's the broom tidiers:

https://github.com/tidymodels/broom/blob/master/R/fixest-tidiers.R

If I find some time, I'll attempt a PR. This would be a good exercise to get to know the internals of the parameters package.

strengejacke commented 3 years ago

fixest is already supported:

library(parameters)
library(fixest)
#> 
#> Attaching package: 'fixest'
#> The following objects are masked from 'package:parameters':
#> 
#>     demean, dof
data(trade)
m1 <- femlm(Euros ~ log(dist_km) | Origin + Destination + Product, data = trade)
m2 <- femlm(log1p(Euros) ~ log(dist_km) | Origin + Destination + Product, data = trade, family = "gaussian")
m3 <- feglm(Euros ~ log(dist_km) | Origin + Destination + Product, data = trade)

model_parameters(m1)
#> Parameter     | Log-Mean |   SE |         95% CI |      z |      p
#> ------------------------------------------------------------------
#> dist_km [log] |    -1.53 | 0.12 | [-1.75, -1.30] | -13.21 | < .001
model_parameters(m2)
#> Parameter     | Coefficient |   SE |         95% CI |      z |      p
#> ---------------------------------------------------------------------
#> dist_km [log] |       -2.17 | 0.15 | [-2.47, -1.87] | -14.07 | < .001
model_parameters(m3)
#> Parameter     | Log-Mean |   SE |         95% CI |      z |      p
#> ------------------------------------------------------------------
#> dist_km [log] |    -1.53 | 0.12 | [-1.75, -1.30] | -13.21 | < .001

Created on 2020-12-09 by the reprex package (v0.3.0)

I think the list of supported models is almost identical to the models supported by insight: https://easystats.github.io/insight/#list-of-supported-models-by-class

The above list is just lists models that are supported by broom, but not yet by parameters. The list of models supported by parameters is much longer (see link to insight).

strengejacke commented 3 years ago

This would be a good exercise to get to know the internals of the parameters package.

The complexity has grown quite some bit the last months ;-) However, most important is having following functions:

  1. standard_error()
  2. ci()
  3. p_value()
  4. degrees_of_freedom()

and in insight: get_statistic() and get_parameters().

It might be that the existing (default) methods already work with new models. model_parameters() mostly calls the internal .extract_parameters_generic(), which relies on the above mentioned functions.

vincentarelbundock commented 3 years ago

Sorry, I should do my homework before filing an issue!

The design info is very useful. I will look into those functions.

The reason I thought fixest wasn't supported is that one of my tryCatch calls broke on the example described here: https://github.com/easystats/parameters/issues/359

IndrajeetPatil commented 3 years ago

If I find some time, I'll attempt a PR. This would be a good exercise to get to know the internals of the parameters package.

Although fixest is supported, in case you are still looking for an excuse to peek into the internals, you can definitely make a PR to support any of the remaining regression models in the list above, or some other models that you use but are currently supported neither by broom nor by parameters 😊

strengejacke commented 3 years ago

What is summary.lm?

IndrajeetPatil commented 3 years ago
m = lm(mpg ~ wt * disp, mtcars)
s = summary(m)

## tidy

tidy(m, conf.int = TRUE)
#> # A tibble: 4 x 7
#>   term        estimate std.error statistic  p.value conf.low conf.high
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)  44.1      3.12        14.1  2.96e-14 37.7       50.5   
#> 2 wt           -6.50     1.31        -4.95 3.22e- 5 -9.19      -3.81  
#> 3 disp         -0.0564   0.0132      -4.26 2.10e- 4 -0.0835    -0.0292
#> 4 wt:disp       0.0117   0.00326      3.60 1.23e- 3  0.00504    0.0184

tidy(s, conf.int = TRUE)
#> # A tibble: 4 x 7
#>   term        estimate std.error statistic  p.value conf.low conf.high
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)  44.1      3.12        14.1  2.96e-14 37.7       50.5   
#> 2 wt           -6.50     1.31        -4.95 3.22e- 5 -9.19      -3.81  
#> 3 disp         -0.0564   0.0132      -4.26 2.10e- 4 -0.0835    -0.0292
#> 4 wt:disp       0.0117   0.00326      3.60 1.23e- 3  0.00504    0.0184

## glance 

glance(m)
#> # A tibble: 1 x 12
#>   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
#>       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1     0.850         0.834  2.45      52.9 1.16e-11     3  -72.0  154.  161.
#> # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

glance(s)
#> # A tibble: 1 x 8
#>   r.squared adj.r.squared sigma statistic  p.value    df df.residual  nobs
#>       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>       <int> <dbl>
#> 1     0.850         0.834  2.45      52.9 1.16e-11     3          28    32
vincentarelbundock commented 3 years ago

To add a bit of context, it is often useful to store summary.lm instead of the full lm object, because the latter also includes the design matrix, so it can be very large.

strengejacke commented 3 years ago
library(broom)
library(parameters)

m <- lm(mpg ~ wt * disp, mtcars)
s <- summary(m)

tidy(m, conf.int = TRUE)
#> # A tibble: 4 x 7
#>   term        estimate std.error statistic  p.value conf.low conf.high
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)  44.1      3.12        14.1  2.96e-14 37.7       50.5   
#> 2 wt           -6.50     1.31        -4.95 3.22e- 5 -9.19      -3.81  
#> 3 disp         -0.0564   0.0132      -4.26 2.10e- 4 -0.0835    -0.0292
#> 4 wt:disp       0.0117   0.00326      3.60 1.23e- 3  0.00504    0.0184

tidy(s, conf.int = TRUE)
#> # A tibble: 4 x 7
#>   term        estimate std.error statistic  p.value conf.low conf.high
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)  44.1      3.12        14.1  2.96e-14 37.7       50.5   
#> 2 wt           -6.50     1.31        -4.95 3.22e- 5 -9.19      -3.81  
#> 3 disp         -0.0564   0.0132      -4.26 2.10e- 4 -0.0835    -0.0292
#> 4 wt:disp       0.0117   0.00326      3.60 1.23e- 3  0.00504    0.0184

model_parameters(m)
#> Parameter   | Coefficient |       SE |         95% CI | t(28) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       44.08 |     3.12 | [37.68, 50.48] | 14.11 | < .001
#> wt          |       -6.50 |     1.31 | [-9.19, -3.81] | -4.95 | < .001
#> disp        |       -0.06 |     0.01 | [-0.08, -0.03] | -4.26 | < .001
#> wt * disp   |        0.01 | 3.26e-03 | [ 0.01,  0.02] |  3.60 | 0.001

model_parameters(s)
#> Parameter   | Coefficient |       SE |         95% CI | t(28) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       44.08 |     3.12 | [37.68, 50.48] | 14.11 | < .001
#> wt          |       -6.50 |     1.31 | [-9.19, -3.81] | -4.95 | < .001
#> disp        |       -0.06 |     0.01 | [-0.08, -0.03] | -4.26 | < .001
#> wt * disp   |        0.01 | 3.26e-03 | [ 0.01,  0.02] |  3.60 | 0.001

Created on 2020-12-30 by the reprex package (v0.3.0)

strengejacke commented 3 years ago

I think orcutt::cochrane.orcutt() already works out-of-the-box.

strengejacke commented 3 years ago

almost done. r2jags will probably not be supported by easystats.

IndrajeetPatil commented 3 years ago

I will try to work on it. I have it installed on my laptop.

strengejacke commented 3 years ago

@IndrajeetPatil I'm not sure if there are any exceptions, maybe you can try to find some failing edge cases? :-)

library(epiR)
#> Loading required package: survival
#> Package epiR 2.0.19 is loaded
#> Type help(epi.about) for summary information
#> Type browseVignettes(package = 'epiR') to learn how to use epiR for applied epidemiological analyses
#> 
library(parameters)

dat <- matrix(c(13,2163,5,3349), nrow = 2, byrow = TRUE)
rownames(dat) <- c("DF+", "DF-")
colnames(dat) <- c("FUS+", "FUS-")

m <- epi.2by2(dat = as.table(dat), method = "cross.sectional", 
              conf.level = 0.95, units = 100, outcome = "as.columns")

model_parameters(m)
#> Parameter                               | Coefficient |             CI
#> ----------------------------------------------------------------------
#> Prevalence Ratio                        |        4.01 | [ 1.43, 11.23]
#> Odds Ratio                              |        4.03 | [ 1.43, 11.31]
#> Attributable Risk                       |        0.45 | [ 0.10,  0.80]
#> Attributable Risk in Population         |        0.18 | [-0.02,  0.38]
#> Attributable Fraction in Exposed (%)    |       75.05 | [30.11, 91.09]
#> Attributable Fraction in Population (%) |       54.20 | [ 3.61, 78.24]
#> 
#> Test that Odds Ratio = 1: Chi2(1) = 8.18, p = 0.004

as.data.frame(model_parameters(m))
#>   Parameter Coefficient      CI_low    CI_high
#> 1        PR   4.0075368  1.43073388 11.2252538
#> 2        OR   4.0256126  1.43312843 11.3078188
#> 3     ARisk   0.4483507  0.09922771  0.7974738
#> 4    PARisk   0.1764216 -0.02254092  0.3753840
#> 5    AFRisk  75.0470162 30.10579998 91.0915155
#> 6   PAFRisk  54.2006228  3.61380175 78.2377250

Created on 2021-03-06 by the reprex package (v1.0.0)

strengejacke commented 3 years ago

closing this, as I will skip the remaining two.