easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
439 stars 36 forks source link

support standardized Bayesian models #233

Closed mattansb closed 4 years ago

mattansb commented 4 years ago

parameters doesn't allow standardization of model parameters:

library(rstanarm)
mtcars$am_f <- factor(mtcars$am)
mtcars$cyl_f <- factor(mtcars$cyl)

stn_model <- stan_glm(mpg ~ am_f * cyl_f, data = mtcars,
                      refresh = 0)

parameters::model_parameters(stn_model, standardize = "basic")
#> Possible multicollinearity between cyl_f8 and am_f1 (r = 0.75), am_f1:cyl_f6 and cyl_f6 (r = 0.73). This might lead to inappropriate results. See 'Details' in '?rope'.
#> Parameter            | Median |          89% CI |     pd | % in ROPE |  Rhat |  ESS |               Prior
#> ---------------------------------------------------------------------------------------------------------
#> (Intercept)          |  22.88 | [ 20.04, 25.68] |   100% |        0% | 1.003 | 1449 | Normal (0 +- 60.27)
#> am_f [1]             |   5.13 | [  1.69,  8.47] | 98.80% |     1.60% | 1.002 | 1459 | Normal (0 +- 15.07)
#> cyl_f [6]            |  -3.79 | [ -7.58, -0.21] | 94.53% |     5.30% | 1.001 | 1648 | Normal (0 +- 15.07)
#> cyl_f [8]            |  -7.80 | [-10.90, -4.60] |   100% |     0.05% | 1.003 | 1540 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [6] |  -3.66 | [ -8.96,  1.21] | 87.45% |     8.33% | 1.001 | 1645 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [8] |  -4.72 | [ -9.32,  0.82] | 92.65% |     4.35% | 1.001 | 1703 | Normal (0 +- 15.07)

Even though effectsize does support this:

effectsize::standardize_parameters(stn_model, method = "basic")
     Parameter Std_Median
#> 1  (Intercept)  0.0000000
#> 2        am_f1  0.4320193
#> 5       cyl_f6 -0.2559893
#> 6       cyl_f8 -0.6519192
#> 3 am_f1:cyl_f6 -0.1870789
#> 4 am_f1:cyl_f8 -0.1974837

It should be possible to pass the results from the internal function effectsize:::.standardize_parameters(), but that doesn't work with all methods for some reason:


std_b1 <- effectsize:::.standardize_parameters(stn_model, method = "basic")
head(std_b1)
#>   (Intercept)       am_f1      cyl_f6     cyl_f8 am_f1:cyl_f6 am_f1:cyl_f8
#> 1           0  0.06274766 -0.52260366 -0.8099216   0.13767192  -0.17975818
#> 2           0  0.26277721 -0.73884262 -1.0541512   0.07593968  -0.05946451
#> 3           0 -0.01442010 -0.13098767 -0.6929162  -0.02356814   0.10666277
#> 4           0  0.21429937 -0.37536793 -0.8302571  -0.14130228  -0.44173134
#> 5           0  0.25481013 -0.02790428 -0.6466512  -0.05369372  -0.05450609
#> 6           0  0.49905297 -0.39758062 -0.6129612  -0.29708664  -0.18732498
parameters::model_parameters(std_b1)
#> Error in data.frame(Parameter = x$Parameter, CI = i * 100, CI_low = x$Std_Coefficient - : arguments imply differing number of rows: 0, 1

std_b2 <- effectsize:::.standardize_parameters(stn_model, method = "refit")
head(std_b2)
#>    (Intercept)     am_f1     cyl_f6     cyl_f8 am_f1:cyl_f6 am_f1:cyl_f8
#> 1  0.387171965 0.9705108 -1.0867589 -0.8372346  0.406569430  -1.13068165
#> 2  0.412490036 0.8834610 -0.7106527 -1.1707998 -0.005951924  -1.28105842
#> 3  0.620268082 0.7506352 -1.1140309 -1.3303843 -0.252197644  -0.43294461
#> 4  0.420361236 1.0881383 -0.6277558 -1.2892625 -0.571962159  -0.90239693
#> 5 -0.005995096 1.4198906 -0.1378329 -0.9602515 -1.287611052  -1.14003855
#> 6  0.961504531 0.3057004 -1.3192741 -1.8021965  0.459667680  -0.02861027
parameters::model_parameters(std_b2)
#> Parameter    | Median |         89% CI |     pd | % in ROPE
#> -----------------------------------------------------------
#> (Intercept)  |   0.48 | [ 0.03,  0.96] | 95.47% |     7.25%
#> am_f1        |   0.83 | [ 0.26,  1.33] | 98.90% |     1.35%
#> cyl_f6       |  -0.65 | [-1.26, -0.03] | 95.23% |     4.85%
#> cyl_f8       |  -1.32 | [-1.83, -0.77] | 99.98% |        0%
#> am_f1:cyl_f6 |  -0.56 | [-1.38,  0.24] | 87.12% |     7.98%
#> am_f1:cyl_f8 |  -0.75 | [-1.55,  0.11] | 92.27% |     4.60%

Created on 2020-04-04 by the reprex package (v0.3.0)

mattansb commented 4 years ago

It seems that this is because of the fact that for method = "basic" the (Intercept) is constant at 0...

strengejacke commented 4 years ago

The problem is that all information like HDI, % in Rope etc. are based on the posterior distribution, while method = "basic" returns classical "standardized" parameters without any "new" and "standardized" posterior distribution that you can use to compute HDI etc. I guess that was probably the reason why standardization was not included for Bayesian models?

DominiqueMakowski commented 4 years ago

I didn't get it, what is the issue? Why can't we merge parameters with the table produced by standardize_parameters? Also, note that we have effectsize::standardize_posteriors() to produce std posteriors.

Effect sizes for Bayesian models are a tricky topic anyway and I'll return to it soon hopefully right after I'm done being a pythonista 😁

mattansb commented 4 years ago

... while method = "basic" returns classical "standardized" parameters without any "new" and "standardized" posterior distribution ...

It does return standardized posteriors (besides for (Intercept), which is fixed to 0).

std_b1 <- effectsize:::.standardize_parameters(stn_model, method = "basic")
head(std_b1)
#>   (Intercept)       am_f1      cyl_f6     cyl_f8 am_f1:cyl_f6 am_f1:cyl_f8
#> 1           0  0.06274766 -0.52260366 -0.8099216   0.13767192  -0.17975818
#> 2           0  0.26277721 -0.73884262 -1.0541512   0.07593968  -0.05946451
#> 3           0 -0.01442010 -0.13098767 -0.6929162  -0.02356814   0.10666277
#> 4           0  0.21429937 -0.37536793 -0.8302571  -0.14130228  -0.44173134
#> 5           0  0.25481013 -0.02790428 -0.6466512  -0.05369372  -0.05450609
#> 6           0  0.49905297 -0.39758062 -0.6129612  -0.29708664  -0.18732498
strengejacke commented 4 years ago

Why can't we merge parameters with the table produced by standardize_parameters

Because the HDI won't be related to the std. parameters, so we need some more information...

std_b1 <- effectsize:::.standardize_parameters(stn_model, method = "basic")

We can't call internal functions (:::) from other packages.

effectsize::standardize_posteriors

That could work...

strengejacke commented 4 years ago

hm. doesn't seem to work:

library(rstanarm)
mtcars$am_f <- factor(mtcars$am)
mtcars$cyl_f <- factor(mtcars$cyl)

stn_model <- stan_glm(mpg ~ am_f * cyl_f, data = mtcars,
                      refresh = 0)

bayestestR::describe_posterior(effectsize::standardize_posteriors(stn_model, method = "basic"))
#> Error in data.frame(Parameter = x$Parameter, CI = i * 100, CI_low = x$Std_Coefficient - : arguments imply differing number of rows: 0, 1

Created on 2020-04-06 by the reprex package (v0.3.0)

mattansb commented 4 years ago

This is because for method != "refit" the intercept is 0. (I think? @DominiqueMakowski , can you confirm the cases when this happens?)

So you'd have to remove the intercept first.

But this give another error...

library(rstanarm)

mtcars$am_f <- factor(mtcars$am)
mtcars$cyl_f <- factor(mtcars$cyl)

stn_model <- stan_glm(mpg ~ am_f * cyl_f, data = mtcars,
                      refresh = 0)

std_post <- effectsize::standardize_posteriors(stn_model, method = "basic")
std_post_no_intercept <- std_post[, colnames(std_post) != "(Intercept)"]
bayestestR:::describe_posterior(std_post_no_intercept)
#> 
#> Could not extract standard errors of standardized coefficients.
#> Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 1, 0

Which has something to do with the class?

class(std_post_no_intercept)
#> [1] "effectsize_std_params" "data.frame"
class(std_post_no_intercept) <- "data.frame"
bayestestR:::describe_posterior(std_post_no_intercept)
#>      Parameter     Median CI     CI_low      CI_high      pd ROPE_CI ROPE_low
#> 1        am_f1  0.4200303 89  0.1322492  0.679286387 0.99250      89     -0.1
#> 4       cyl_f6 -0.2619940 89 -0.5162015  0.007898688 0.94225      89     -0.1
#> 5       cyl_f8 -0.6568089 89 -0.9410599 -0.403611944 0.99975      89     -0.1
#> 2 am_f1:cyl_f6 -0.1818165 89 -0.4076509  0.080154380 0.87725      89     -0.1
#> 3 am_f1:cyl_f8 -0.1890411 89 -0.3904742  0.006291445 0.93750      89     -0.1
#>   ROPE_high ROPE_Percentage
#> 1       0.1       0.0000000
#> 4       0.1       0.1190677
#> 5       0.1       0.0000000
#> 2       0.1       0.2853131
#> 3       0.1       0.2019096

Created on 2020-04-06 by the reprex package (v0.3.0)

Not sure why that error is there...

strengejacke commented 4 years ago

we should probably add a class attribute to effectsize::standardize_posterior(), and then add a describe_posterior.xxx() method to bayestestR, than we can make it work in parameters.

mattansb commented 4 years ago

Would be nice to still have the Rhat, ESS, Prior (etc) columns - taken from the non standardized model_parameters?

strengejacke commented 4 years ago

Yes.

strengejacke commented 4 years ago
library(rstanarm)
#> Loading required package: Rcpp
#> rstanarm (Version 2.19.3, packaged: 2020-02-11 05:16:41 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting
mtcars$am_f <- factor(mtcars$am)
mtcars$cyl_f <- factor(mtcars$cyl)

m <- stan_glm(mpg ~ am_f * cyl_f, data = mtcars, refresh = 0)

effectsize::standardize_parameters(m, method = "basic")
#>       Parameter Std_Median
#> 1   (Intercept)         NA
#> 11        am_f1  0.4221908
#> 4        cyl_f6 -0.2656329
#> 5        cyl_f8 -0.6559976
#> 2  am_f1:cyl_f6 -0.1768122
#> 3  am_f1:cyl_f8 -0.1912619

parameters::model_parameters(m)
#> Possible multicollinearity between cyl_f8 and am_f1 (r = 0.76), am_f1:cyl_f6 and cyl_f6 (r = 0.74). This might lead to inappropriate results. See 'Details' in '?rope'.
#> Parameter            | Median |          89% CI |     pd | % in ROPE |  Rhat |  ESS |               Prior
#> ---------------------------------------------------------------------------------------------------------
#> (Intercept)          |  22.95 | [ 19.91, 25.71] |   100% |        0% | 0.999 | 1544 | Normal (0 +- 60.27)
#> am_f [1]             |   5.10 | [  1.44,  8.16] | 98.90% |     1.40% | 1.000 | 1462 | Normal (0 +- 15.07)
#> cyl_f [6]            |  -3.81 | [ -7.42,  0.20] | 94.12% |     5.97% | 0.999 | 1669 | Normal (0 +- 15.07)
#> cyl_f [8]            |  -7.84 | [-11.01, -4.49] |   100% |     0.02% | 1.000 | 1652 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [6] |  -3.60 | [ -8.61,  1.65] | 87.58% |     7.47% | 0.999 | 1803 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [8] |  -4.69 | [ -9.80,  0.24] | 92.65% |     5.05% | 1.001 | 1806 | Normal (0 +- 15.07)

parameters::model_parameters(m, standardize = "basic")
#> Possible multicollinearity between cyl_f8 and am_f1 (r = 0.76), am_f1:cyl_f6 and cyl_f6 (r = 0.74). This might lead to inappropriate results. See 'Details' in '?rope'.
#> Parameter            | Median |         89% CI |     pd | % in ROPE |  Rhat |  ESS |               Prior
#> --------------------------------------------------------------------------------------------------------
#> (Intercept)          |        |                |        |           | 0.999 | 1544 | Normal (0 +- 60.27)
#> am_f [1]             |   0.42 | [ 0.12,  0.68] | 98.90% |     2.97% | 1.000 | 1462 | Normal (0 +- 15.07)
#> cyl_f [6]            |  -0.27 | [-0.52,  0.01] | 94.12% |    15.00% | 0.999 | 1669 | Normal (0 +- 15.07)
#> cyl_f [8]            |  -0.66 | [-0.92, -0.38] |   100% |     0.07% | 1.000 | 1652 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [6] |  -0.18 | [-0.42,  0.08] | 87.58% |    26.07% | 0.999 | 1803 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [8] |  -0.19 | [-0.40,  0.01] | 92.65% |    22.00% | 1.001 | 1806 | Normal (0 +- 15.07)

Created on 2020-04-06 by the reprex package (v0.3.0)

mattansb commented 4 years ago

Fixed Bayes factors:

library(rstanarm)
#> Loading required package: Rcpp
#> rstanarm (Version 2.19.3, packaged: 2020-02-11 05:16:41 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - bayesplot theme set to bayesplot::theme_default()
#>    * Does _not_ affect other ggplot2 plots
#>    * See ?bayesplot_theme_set for details on theme setting

mtcars$am_f <- factor(mtcars$am)
mtcars$cyl_f <- factor(mtcars$cyl)

m <- stan_glm(mpg ~ am_f * cyl_f, data = mtcars, refresh = 0)

set.seed(1)
parameters::model_parameters(m, standardize = "basic", test = "bf")
#> Computation of Bayes factors: sampling priors, please wait...
#> Loading required namespace: logspline
#> Parameter            | Median |         89% CI |     BF |  Rhat |  ESS |               Prior
#> --------------------------------------------------------------------------------------------
#> (Intercept)          |        |                |  > 999 | 1.004 | 1592 | Normal (0 +- 60.27)
#> am_f [1]             |   0.41 | [ 0.14,  0.68] |   2.28 | 1.003 | 1627 | Normal (0 +- 15.07)
#> cyl_f [6]            |  -0.27 | [-0.53, -0.01] |   0.68 | 1.002 | 1638 | Normal (0 +- 15.07)
#> cyl_f [8]            |  -0.66 | [-0.94, -0.41] | 170.20 | 1.004 | 1615 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [6] |  -0.17 | [-0.40,  0.09] |   0.39 | 1.003 | 1674 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [8] |  -0.19 | [-0.38,  0.02] |   0.60 | 1.003 | 2163 | Normal (0 +- 15.07)

set.seed(1)
parameters::model_parameters(m, standardize = "refit", test = "bf")
#> Computation of Bayes factors: sampling priors, please wait...
#> Parameter            | Median |         89% CI |     BF |  Rhat |  ESS |               Prior
#> --------------------------------------------------------------------------------------------
#> (Intercept)          |   0.47 | [ 0.00,  0.95] |  > 999 | 1.004 | 1592 | Normal (0 +- 60.27)
#> am_f [1]             |   0.84 | [ 0.30,  1.40] |   2.28 | 1.003 | 1627 | Normal (0 +- 15.07)
#> cyl_f [6]            |  -0.63 | [-1.26, -0.03] |   0.68 | 1.002 | 1638 | Normal (0 +- 15.07)
#> cyl_f [8]            |  -1.30 | [-1.81, -0.76] | 170.20 | 1.004 | 1615 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [6] |  -0.60 | [-1.44,  0.21] |   0.39 | 1.003 | 1674 | Normal (0 +- 15.07)
#> am_f [1] * cyl_f [8] |  -0.77 | [-1.60,  0.05] |   0.60 | 1.003 | 2163 | Normal (0 +- 15.07)

Created on 2020-04-06 by the reprex package (v0.3.0)