easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
577 stars 55 forks source link

describe_posterior() and friends for sigma in a linear model #338

Closed buyske closed 3 years ago

buyske commented 4 years ago

If I create a Bayesian linear regression, say model <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0) and then use the describe_posterior(),hdi(), orpoint_estimate() functions, I get results for the intercept and slopes but not for what the rstanarm package refers to as Family Specific Parameters or Auxiliary parameter(s).

Pardon my ignorance, but is there a conceptual/pedagogical reason to exclude it? If I want to get such results, is there a way to do so in the bayestestR framework? I'm stumbling over a straightforward way for my students to get summaries for the posteriors of all the parameters.

DominiqueMakowski commented 4 years ago

You can obtain info about priors through bayestestR::describe_prior(model). As for the other parameters, I'm not sure what you are referring to? You mean sigma? @mattansb

buyske commented 4 years ago

Sorry, I edited that out by mistake. I did indeed mean that I don't see how to extract info on sigma

mattansb commented 4 years ago

Hmmm it seems like there really isn't a way to get distributional parameters, Bayesian or otherwise:


fm_nbin <- glm(cyl ~ ., data = mtcars, family = quasipoisson())

# Does not return Dispersion parameter
insight::get_parameters(fm_nbin)
#>      Parameter      Estimate
#> 1  (Intercept)  3.1253652940
#> 2          mpg -0.0058390159
#> 3         disp  0.0006115320
#> 4           hp  0.0002487659
#> 5         drat -0.0981988171
#> 6           wt -0.0366283971
#> 7         qsec -0.0406991818
#> 8           vs -0.1092920789
#> 9           am -0.1126823833
#> 10        gear -0.0517634494
#> 11        carb  0.0280156537

library(rstanarm)

model <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0)

bayestestR::describe_posterior(model, effects = "all")
#> # Description of Posterior Distributions
#> 
#> Parameter   | Median |           89% CI |    pd |        89% ROPE | % in ROPE |  Rhat |     ESS
#> -----------------------------------------------------------------------------------------------
#> (Intercept) | 38.939 | [31.813, 47.377] | 1.000 | [-0.603, 0.603] |     0.000 | 1.003 | 135.348
#> wt          | -5.525 | [-6.613, -4.195] | 1.000 | [-0.603, 0.603] |     0.000 | 1.009 | 147.431
#> gear        | -0.337 | [-1.780,  0.862] | 0.615 | [-0.603, 0.603] |    56.425 | 1.005 | 138.662

head(as.data.frame(model))
#>   (Intercept)        wt       gear    sigma
#> 1    35.58415 -4.989593  0.1020943 2.764029
#> 2    41.35508 -5.898783 -0.5769509 3.371416
#> 3    37.57634 -5.641074  0.2134571 3.584751
#> 4    40.03907 -5.653430 -0.4445428 3.173185
#> 5    42.48010 -5.770290 -0.7982663 3.146593
#> 6    40.45137 -5.561130 -0.4110539 3.681150

library(brms)
model2 <- brm(mpg ~ wt + gear, 
              data = mtcars, chains = 2, iter = 200, refresh = 0)

bayestestR::describe_posterior(model2, effects = "all")
#> # Description of Posterior Distributions
#> 
#> Parameter | Median |           89% CI |    pd |        89% ROPE | % in ROPE |  Rhat |    ESS
#> --------------------------------------------------------------------------------------------
#> Intercept | 39.343 | [29.715, 45.615] | 1.000 | [-0.603, 0.603] |     0.000 | 0.995 | 51.841
#> wt        | -5.430 | [-6.591, -4.456] | 1.000 | [-0.603, 0.603] |     0.000 | 0.997 | 81.453
#> gear      | -0.401 | [-1.500,  1.510] | 0.670 | [-0.603, 0.603] |    41.341 | 0.997 | 54.043

head(as.data.frame(model2))
#> b_Intercept      b_wt     b_gear    sigma      lp__
#> 1    39.35955 -5.695667 -0.3030342 3.344781 -84.26330
#> 2    38.78282 -5.786911 -0.3171797 2.763572 -86.60660
#> 3    45.08408 -6.293783 -1.1585615 2.628596 -85.79932
#> 4    42.41537 -5.287585 -1.4121143 2.870607 -85.45501
#> 5    33.24211 -5.355504  1.0726134 3.539263 -85.77091
#> 6    35.14802 -5.192731  0.3241216 2.746833 -84.69216

Unless they are themselves modelled:

model3 <- brm(bf(mpg ~ wt + gear,
                 sigma ~ wt), 
              data = mtcars, chains = 2, iter = 200, refresh = 0)
bayestestR::describe_posterior(model3, effects = "all")
#> # Description of Posterior Distributions
#> 
#> Parameter       | Median |           89% CI |    pd |        89% ROPE | % in ROPE |  Rhat |    ESS
#> --------------------------------------------------------------------------------------------------
#> Intercept       | 39.048 | [29.817, 49.240] | 1.000 | [-0.603, 0.603] |     0.000 | 1.040 | 33.259
#> sigma_Intercept |  1.348 | [ 0.369,  2.357] | 0.970 | [-0.603, 0.603] |     5.587 | 1.002 | 90.926
#> wt              | -5.322 | [-7.068, -3.974] | 1.000 | [-0.603, 0.603] |     0.000 | 1.016 | 41.657
#> gear            | -0.378 | [-2.229,  0.936] | 0.660 | [-0.603, 0.603] |    46.927 | 1.040 | 48.607
#> sigma_wt        | -0.060 | [-0.343,  0.217] | 0.590 | [-0.603, 0.603] |   100.000 | 1.006 | 80.160

I think this is an insight related question - is there a way to get all of the model's parameters?

DominiqueMakowski commented 4 years ago

@strengejacke maybe we should add an access function / an argument in insight (or in bayestestr)? Not sure if that'd be helpful

strengejacke commented 4 years ago

Sigma is shown for mixed models, but not for non-mixed models. It should be easy to implement, we must however check carefully if it will break a lot of code or not. Because mostly, we use effects = "all" and component = "all" already, and this would include sigma, so get_parameters() would now return that parameter by default.

strengejacke commented 4 years ago

Well, it should not break code, but most likely tests.

strengejacke commented 4 years ago

If you update insight from GitHub, it should work:

library(bayestestR)
library(brms)
library(rstanarm)

m1 <- brm(bf(mpg ~ wt + gear), data = mtcars, chains = 2, iter = 200, refresh = 0)
describe_posterior(m1, effects = "all", component = "all")
#> # Description of Posterior Distributions
#> 
#> # Fixed Effects (Conditional Model)
#> 
#> Parameter | Median |           89% CI |      pd |        89% ROPE | % in ROPE |  Rhat |    ESS
#> ----------------------------------------------------------------------------------------------
#> Intercept | 39.428 | [29.534, 46.205] | 100.00% | [-0.603, 0.603] |     0.000 | 1.020 | 51.254
#> wt        | -5.570 | [-6.492, -4.412] | 100.00% | [-0.603, 0.603] |     0.000 | 1.021 | 78.338
#> gear      | -0.420 | [-1.997,  1.263] |  62.00% | [-0.603, 0.603] |    44.693 | 1.016 | 51.352
#> 
#> # Sigma (fixed effects)
#> 
#> Parameter | Median |           89% CI |      pd |        89% ROPE | % in ROPE |  Rhat |     ESS
#> -----------------------------------------------------------------------------------------------
#> sigma     |  3.158 | [ 2.617,  3.760] | 100.00% | [-0.603, 0.603] |         0 | 0.998 | 188.422

m2 <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0)
describe_posterior(m2, effects = "all", component = "all")
#> # Description of Posterior Distributions
#> 
#> # Fixed Effects (Conditional Model)
#> 
#> Parameter   | Median |           89% CI |      pd |        89% ROPE | % in ROPE |  Rhat |     ESS
#> -------------------------------------------------------------------------------------------------
#> (Intercept) | 38.507 | [30.979, 45.695] | 100.00% | [-0.603, 0.603] |     0.000 | 0.998 | 160.357
#> wt          | -5.410 | [-6.499, -4.410] | 100.00% | [-0.603, 0.603] |     0.000 | 1.007 | 149.427
#> gear        | -0.295 | [-1.954,  0.880] |  60.00% | [-0.603, 0.603] |    52.514 | 0.992 | 227.403
#> 
#> # Sigma (fixed effects)
#> 
#> Parameter | Median |           89% CI |      pd |        89% ROPE | % in ROPE |  Rhat |    ESS
#> ----------------------------------------------------------------------------------------------
#> sigma     |  3.131 | [ 2.595,  3.759] | 100.00% | [-0.603, 0.603] |         0 | 1.060 | 53.531

Created on 2020-09-23 by the reprex package (v0.3.0)

strengejacke commented 4 years ago

Or you use model_parameters(), which slightly differs in the default print:

library(parameters)
library(brms)
library(rstanarm)

m1 <- brm(bf(mpg ~ wt + gear), data = mtcars, chains = 2, iter = 200, refresh = 0)
model_parameters(m1)
#> # Fixed effects 
#> 
#> Parameter   | Median |         89% CI |     pd | % in ROPE |  Rhat |    ESS
#> ---------------------------------------------------------------------------
#> (Intercept) |  40.70 | [34.14, 48.17] |   100% |        0% | 1.011 |  86.18
#> wt          |  -5.67 | [-6.70, -4.51] |   100% |        0% | 1.008 | 128.36
#> gear        |  -0.63 | [-1.89,  0.61] | 76.00% |    41.00% | 0.999 |  98.06
#> 
#> # Fixed effects sigma
#> 
#> Parameter | Median |       89% CI |   pd | % in ROPE |  Rhat |    ESS
#> ---------------------------------------------------------------------
#> sigma     |   3.18 | [2.57, 3.84] | 100% |        0% | 1.006 | 173.31

m2 <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0)
model_parameters(m2)
#> # Fixed effects 
#> 
#> Parameter   | Median |         89% CI |     pd | % in ROPE |  Rhat |    ESS |                   Prior
#> -----------------------------------------------------------------------------------------------------
#> (Intercept) |  38.37 | [31.00, 49.27] |   100% |        0% | 1.000 | 214.54 | Normal (20.09 +- 15.07)
#> wt          |  -5.48 | [-6.84, -4.31] |   100% |        0% | 1.000 | 212.01 |  Normal (0.00 +- 15.40)
#> gear        |  -0.34 | [-1.90,  1.35] | 63.50% |    45.00% | 0.997 | 232.49 |  Normal (0.00 +- 20.42)
#> 
#> # Fixed effects sigma
#> 
#> Parameter | Median |       89% CI |   pd | % in ROPE |  Rhat |    ESS |     Prior
#> ---------------------------------------------------------------------------------
#> sigma     |   3.21 | [2.51, 3.86] | 100% |        0% | 1.005 | 100.93 | NA ( +- )

Created on 2020-09-23 by the reprex package (v0.3.0)

DominiqueMakowski commented 4 years ago

I'm not sure it's necessary to display it by default in model_parameters() (I think that most people would not expect it in the default parameters)

mattansb commented 4 years ago

@strengejacke Does this work with other distributional parameters? Dispersion, family-specific parameters, etc?

strengejacke commented 4 years ago

it works for classical regression (https://easystats.github.io/parameters/articles/model_parameters.html#mixed-models-with-dispersion-model), but I'm not sure if with Bayesian as well. I would need some running examples, to know how these parameters are named in brms and rstanarm.

strengejacke commented 4 years ago

I'm not sure it's necessary to display it by default in model_parameters() (I think that most people would not expect it in the default parameters)

Why not? This is nothing particular new and always worked for Bayesian mixed models, now it works for non-mixed as well. I'm currently reading https://avehtari.github.io/ROS-Examples/, and this convinced me including information on residual standard deviation as default for model_parameters():

library(parameters)
model <- lm(mpg ~ wt + cyl, data = mtcars)

model_parameters(model)
#> Parameter   | Coefficient |   SE |         95% CI |     t | df |      p
#> -----------------------------------------------------------------------
#> (Intercept) |       39.69 | 1.71 | [36.18, 43.19] | 23.14 | 29 | < .001
#> wt          |       -3.19 | 0.76 | [-4.74, -1.64] | -4.22 | 29 | < .001
#> cyl         |       -1.51 | 0.41 | [-2.36, -0.66] | -3.64 | 29 | 0.001 
#> 
#> Residual standard deviation: 2.57

Created on 2020-09-23 by the reprex package (v0.3.0)

But I'm open for other suggestions.

strengejacke commented 4 years ago

https://github.com/easystats/parameters/issues/305

strengejacke commented 4 years ago

Ok, I reverted the change (https://github.com/easystats/insight/commit/1bfd224ec0701f2f2299184e6621039be2339e30), so sigma for non-mixed models is not included by default. So I reopen this issue and leave the implementation to @DominiqueMakowski.

strengejacke commented 3 years ago

@DominiqueMakowski I added a component argument to stanreg methods, which defaults to "location", which is equal to the former default in insight. component = "all" will also include auxiliary parameters. I think this should resolve this issue without breaking code or changing default behaviour.

strengejacke commented 3 years ago

Can someone check if this issue is resolved?

mattansb commented 3 years ago

Yup, seems to work.

model <- rstanarm::stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0)
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#tail-ess

library(bayestestR)
#> Note: The default CI width (currently `ci=0.89`) might change in future versions (see https://github.com/easystats/bayestestR/discussions/250). To prevent any issues, please set it explicitly when using bayestestR functions, via the 'ci' argument.

describe_posterior(model, component = "all", test = NULL)
#> Summary of Posterior Distribution
#> 
#> Parameter   | Median |         89% CI |  Rhat |    ESS
#> ------------------------------------------------------
#> (Intercept) |  39.08 | [31.02, 49.26] | 1.007 | 187.00
#> wt          |  -5.58 | [-6.98, -4.54] | 1.013 | 215.00
#> gear        |  -0.32 | [-1.90,  1.28] | 1.001 | 249.00
#> 
#> # Fixed effects sigma
#> 
#> Parameter | Median |         89% CI
#> -----------------------------------
#> sigma     |   3.16 | [ 2.47,  4.03]

hdi(model, component = "all")
#> Highest Density Interval
#> 
#> Parameter   |        89% HDI
#> ----------------------------
#> (Intercept) | [31.02, 49.26]
#> wt          | [-6.98, -4.54]
#> gear        | [-1.90,  1.28]
#> 
#> # Fixed effects sigma
#> 
#> Parameter |        89% HDI
#> --------------------------
#> sigma     | [ 2.47,  4.03]

point_estimate(model, component = "all")
#> Point Estimate
#> 
#> Parameter   | Median |  Mean |   MAP | Effects |   Component
#> ------------------------------------------------------------
#> (Intercept) |  39.08 | 39.30 | 39.15 |   fixed | conditional
#> wt          |  -5.58 | -5.55 | -5.72 |   fixed | conditional
#> gear        |  -0.32 | -0.33 | -0.22 |   fixed | conditional
#> sigma       |   3.16 |  3.24 |  3.14 |   fixed |       sigma

Created on 2021-03-21 by the reprex package (v1.0.0)