lmer and arm::sim instead of stan_glmer on large data?

soloaus24 commented 5 years ago

Hello!

At my company I spend a lot of time analyzing experiments and the datasets I'm working with are rather large. I typically build multilevel models with lmer. Building these models with the brm function or the stan_glmer function isn't practical because it would take forever for them to build. As a workaround, I try to get a posterior distribution of coefficient estimates by using the sim function from the arm package on the model object. I then take the fixed effects data frame from that object and pass it through to the hdi and rope functions from the bayestestR package. What are the major differences between using stan_glmer and the method I've outlined above? Is it OK for me to report on things like HDI and ROPE for coefficient estimates using my method or is that statistically incorrect? Any help would be greatly appreciated! Below is an example of my workflow.

fit <- lmer(y ~ x + (1 | group), data = df)
sim_fit <- sim(fit, 10000)
posteriors <- sim_fit@fixef
hdi(posteriors)
rope(posteriors)

mattansb commented 5 years ago

This seems reasonable, as this exact same procedure is used by the BayesFactor package to validate the "true" Bayesian posteriors (see here). It is unclear from the documentation if the priors are assumed to be flat (in which case the posterior is sampled from the ML function), or something else (I suspect the former is true) - but this will only have an effect on posterior based Bayes factors, while all other methods (ci, rope, etc...) care not for those pesky priors 😊

mattansb commented 5 years ago

Also, you might find rstanarm::stan_biglm useful, for big data-sets.

strengejacke commented 5 years ago

Looking at Gelman & Hill (2007). pages 140 following, making inference based on simulations can be called an "informal Bayesian approach".

Technically, you are summarizing a vector. This is what is also done for "simple" bootstrapping etc., and then you take the mean/median as "point estimate", and some quantiles for the confidence intervals. So, yes, you can simply compute the hdi or eti from your simulated draws.

I haven't checked for other models than simple linear models, but at least simulate() returns values on the response scale, hence you have no proper distribution for binomial or count models (and thus, hdi won't work). If arm::sim() returns the values on the link-scale, you can, however, use hdi etc.

I'm implementing a metod for the sim.merMod class, so you can use hdi() directly with sim_fit from your example.

strengejacke commented 5 years ago

And you can also use ROPE - it's similar to saying, for a "simple" frequentist model w/o simulations, "we have a significant effect, but it is such small that it is practically not relevant". You do something similar with ROPE.

strengejacke commented 5 years ago

Here are examples from the current implementation. Objects from arm::sim() can be directly called within hdi(), ci() and eti():

library(arm)
library(lme4)
library(bayestestR)

m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sim_fit <- arm::sim(m, 1000)

hdi(sim_fit)
#>     Parameter CI     CI_low   CI_high
#> 1 (Intercept) 89 240.332765 261.07408
#> 2        Days 89   8.307967  13.15562

hdi(sim_fit, effects = "all")
#>          Parameter CI      CI_low     CI_high  Group
#> 1      (Intercept) 89 240.3327645 261.0740824  fixed
#> 2             Days 89   8.3079667  13.1556201  fixed
#> 3  (Intercept).308 89 -16.0073828  24.8423555 random
#> 4         Days.308 89   4.3755654  13.0849383 random
#> 5  (Intercept).309 89 -61.8609206 -20.8313242 random
#> 6         Days.309 89 -13.1609398  -4.6711144 random
#> 7  (Intercept).310 89 -56.8298998 -17.5504370 random
#> 8         Days.310 89  -9.7004658  -0.9473180 random
#> 9  (Intercept).330 89   4.6399315  45.1686082 random
#> 10        Days.330 89  -9.6072097  -0.9570290 random
#> 11 (Intercept).331 89   4.4185954  42.5649342 random
#> 12        Days.331 89  -7.1452027   1.3021877 random
#> 13 (Intercept).332 89  -8.0982316  31.2101980 random
#> 14        Days.332 89  -4.0594400   4.4935157 random
#> 15 (Intercept).333 89  -5.0627446  35.5392311 random
#> 16        Days.333 89  -4.8338501   3.7091377 random
#> 17 (Intercept).334 89 -25.9108186  13.8798611 random
#> 18        Days.334 89  -3.3262606   5.0170427 random
#> 19 (Intercept).335 89 -22.5379495  19.2082650 random
#> 20        Days.335 89 -15.3973349  -6.4065836 random
#> 21 (Intercept).337 89  12.6562494  55.1728956 random
#> 22        Days.337 89   4.5316538  13.6732566 random
#> 23 (Intercept).349 89 -44.7094372  -3.1723557 random
#> 24        Days.349 89  -2.7934368   6.3273793 random
#> 25 (Intercept).350 89 -31.8035689  10.0281715 random
#> 26        Days.350 89   2.2081889  10.7003413 random
#> 27 (Intercept).351 89 -16.3572131  25.8744674 random
#> 28        Days.351 89  -7.7999446   0.9852168 random
#> 29 (Intercept).352 89   0.3227598  41.7302096 random
#> 30        Days.352 89  -0.9254458   8.4896326 random
#> 31 (Intercept).369 89 -15.4252308  25.1249666 random
#> 32        Days.369 89  -3.8442958   4.7227639 random
#> 33 (Intercept).370 89 -47.7736334  -5.0288098 random
#> 34        Days.370 89   0.7492633   9.4866305 random
#> 35 (Intercept).371 89 -19.1119936  22.8159279 random
#> 36        Days.371 89  -5.5520323   3.1366488 random
#> 37 (Intercept).372 89  -8.0621136  33.6017260 random
#> 38        Days.372 89  -3.4948385   5.2878925 random

eti(sim_fit)
#>     Parameter CI     CI_low  CI_high
#> 1 (Intercept) 89 241.151504 262.0106
#> 2        Days 89   7.962468  12.9025

ci(sim_fit, effects = "random", parameters = "^Days")
#>    Parameter CI      CI_low    CI_high
#> 2   Days.308 89   4.8214107 13.7000214
#> 4   Days.309 89 -13.0035203 -4.5120176
#> 6   Days.310 89  -9.9596965 -1.0915085
#> 8   Days.330 89  -9.2507534 -0.3174804
#> 10  Days.331 89  -7.3513576  1.1702467
#> 12  Days.332 89  -4.4429156  4.2187887
#> 14  Days.333 89  -4.6865175  3.8929752
#> 16  Days.334 89  -3.4735085  4.9383328
#> 18  Days.335 89 -15.3919325 -6.3964247
#> 20  Days.337 89   3.5976629 12.9306430
#> 22  Days.349 89  -3.1541218  5.9806673
#> 24  Days.350 89   2.2605994 10.7451412
#> 26  Days.351 89  -7.5749799  1.3499830
#> 28  Days.352 89  -1.2548782  8.3217727
#> 30  Days.369 89  -3.7034767  5.0224508
#> 32  Days.370 89   0.1093224  9.0175893
#> 34  Days.371 89  -5.4278820  3.3033422
#> 36  Days.372 89  -3.2618426  5.6145144

^{Created on 2019-08-27 by the reprex package (v0.3.0)}

strengejacke commented 5 years ago

We can also extend the functions to simulatons from "simple" models, where arm::sim() returns an object of class sim.

And we could also extend the support to functions like point_estimate() etc.

[x] ROPE for sim.merMod
[x] point_estimate for sim.merMod
[x] point_estimate for sim
[x] hdi for sim
[x] ci for sim
[x] eti for sim

strengejacke commented 5 years ago

Ok, once you update both bayestestR and see from GitHub, you can also plot:

library(arm)
library(lme4)
library(bayestestR)

m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sim_fit <- arm::sim(m, 1000)

x <- ci(sim_fit, ci = c(.5, .8, .95), effects = "random", parameters = "^Days")
plot(x)

^{Created on 2019-08-27 by the reprex package (v0.3.0)}

strengejacke commented 5 years ago

Ok, I have implemented a get_parameters()´ andfind_parameters()method forsimandsim.merMod´ objects in the insight package. This makes it easier to implement support for more objects in bayestestR etc.

So, now you need to update insight, bayestestR and see from GitHub, and you can use following functions with objects from arm::sim() (both mixed and simple models);

hdi() (with plot())
ci() (with plot())
eti() (with plot())
rope()
point_estimate() (with plot())

The plot-method for rope() from sim/sim.merMod objects is slightly trickier according to the data handling, but this might follow...

Actually, we could make sim/sim.merMod objects also work for pd, p_map, bayesfactor etc. I'l look at it.

strengejacke commented 5 years ago

bayesfactors do work now as well, however, I'm not sure about the prior-specification. By default, normal priors N(0,2) are used:

library(arm)
library(lme4)
library(bayestestR)

m <- lmer(mpg ~ wt + gear + (1 | cyl), mtcars)
sim_fit <- arm::sim(m, 1000)

plot(bayesfactor(sim_fit))
#> Warning in bayesfactor_parameters.sim.merMod(..., prior = prior, direction
#> = direction, : Prior not specified! Please specify priors (with column
#> order matching 'posterior') to get meaningful results. Using normally
#> distributed priors with location=0 and scale=2 as default.

^{Created on 2019-08-27 by the reprex package (v0.3.0)}

@mattansb any idea for sensible defaults?

strengejacke commented 5 years ago

Ok, I think the most important functions are covered for arm:sim-objects. That means you can now also use describe_posterior() to get comprehensive summaries.

plot() methods work for all functions (like hid, ci, pd, point_estimate, ...) except for rope() and equivalence_test(). This becomes easier once we will have a better function to retrieve data in the see package...

Here an example of the current implementation:

library(arm)
library(lme4)
library(bayestestR)
data(mtcars)

m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
x <- arm::sim(m, 1000)

describe_posterior(x, test = "all")
#> Warning in bayesfactor_parameters.sim(x, prior = bf_prior, ...): Prior not
#> specified! Please specify priors (with column order matching 'posterior')
#> to get meaningful results. Using normally distributed priors with
#> location=0 and scale=2 as default.
#>     Parameter       Median CI      CI_low     CI_high      p_MAP    pd
#> 1 (Intercept) 43.603826962 89 35.32796965 50.48548749 0.00000000 1.000
#> 2          wt -3.818821010 89 -5.44566695 -2.07529838 0.00000000 1.000
#> 3         cyl -1.744702403 89 -2.72878740 -0.67135432 0.03530381 0.996
#> 4        gear -0.490838016 89 -1.91306977  0.60695506 0.77366017 0.734
#> 5        disp  0.006572141 89 -0.01075689  0.02754414 0.96379712 0.700
#>   ROPE_CI ROPE_low ROPE_high ROPE_Percentage ROPE_Equivalence           BF
#> 1      89     -0.1       0.1      0.00000000         Rejected 9.747696e+07
#> 2      89     -0.1       0.1      0.00000000         Rejected 1.131505e+02
#> 3      89     -0.1       0.1      0.00000000         Rejected 8.860870e+00
#> 4      89     -0.1       0.1      0.08417508        Undecided 5.035772e-01
#> 5      89     -0.1       0.1      1.00000000         Accepted 6.321714e-03

m <- lmer(mpg ~ wt + gear + (1 | cyl), mtcars)
sim_fit <- arm::sim(m, 1000)

describe_posterior(sim_fit, test = "all")
#> Warning in bayesfactor_parameters.sim.merMod(x, prior = bf_prior, ...):
#> Prior not specified! Please specify priors (with column order matching
#> 'posterior') to get meaningful results. Using normally distributed priors
#> with location=0 and scale=2 as default.
#>     Parameter     Median CI    CI_low    CI_high     p_MAP    pd ROPE_CI
#> 1 (Intercept) 33.8506438 89 24.912640 40.7477347 0.0000000 1.000      89
#> 2          wt -3.7075638 89 -4.867696 -2.2566382 0.0000000 1.000      89
#> 3        gear -0.4131907 89 -1.600556  0.8184972 0.8257983 0.733      89
#>   ROPE_low ROPE_high ROPE_Percentage ROPE_Equivalence           BF
#> 1     -0.1       0.1      0.00000000         Rejected 2.125676e+05
#> 2     -0.1       0.1      0.00000000         Rejected 1.275944e+03
#> 3     -0.1       0.1      0.09652076        Undecided 4.160897e-01

^{Created on 2019-08-27 by the reprex package (v0.3.0)}

mattansb commented 5 years ago

@strengejacke it seems the flat priors are used, so by definition BFs will always be 1/Inf = 0 (see brms::hypothesis when used with flat priors). I would remove the bayesfactor_parameters support for sim posteriors - users could still use the BIC approximation with bayesfactor_models. (As we've discussed elsewhere, compering two distributions that don't have a true prior>data>posterior relationship will by definition result in meaningless BFs, so we can't just tack on so distribution in place of the true prior... 🤐)

strengejacke commented 5 years ago

users could still use the BIC approximation with bayesfactor_models

ok, but this won't work with sim-objects, right? Would you say that weakly informative priors in the way rstanarm calculates priors would make sense?

mattansb commented 5 years ago

ok, but this won't work with sim-objects, right?

No... But the original model can be used. Alternatively - you can have:

rope.merMod <- function(x, ..., n.sims = NULL){
  if (!is.null(n.sims)) {
    x <- arm::sim(x, n.sims = n.sims)
    rope.sim(x, ...)
  }
  ...
}

(or something along those lines) And then in all cases the model can be used as is.

Would you say that weakly informative priors in the way rstanarm calculates priors would make sense?

Nope 😊 If the priors are flat, then the posteriors are based on those flat priors... So any different priors would need different posteriors... That's just the price one pays for using flat priors!

mattansb commented 5 years ago

In other words - if we set some / any default "priors", we will be providing misleading erroneous results...

strengejacke commented 5 years ago

Ok, so I will remove BF-support for sim/sim.merMod objects... I was just curious because we had data frame methods as well...

mattansb commented 5 years ago

I made those for use with Bayesian models non yet supported by us (🤫), but they too do not have default priors - they just always return a 1 - the most uninformative BF possible haha

library(bayestestR)
x <- data.frame(A = rnorm(1000))
bayesfactor_parameters(x)
#> Warning in bayesfactor_parameters.data.frame(x): Prior not specified!
#> Please specify priors (with column order matching 'posterior') to get
#> meaningful results.
#> Loading required namespace: logspline
#> # Bayes Factor (Savage-Dickey density ratio)
#> 
#>  Parameter Bayes Factor
#>          A            1
#> 
#> * Evidence Against The Null: [0]

^{Created on 2019-08-27 by the reprex package (v0.3.0)}

strengejacke commented 5 years ago

Ok, my argument would be that rstanarm, if no priors provided, "creates" weakly informative priors based on the data. Thus, I would say we could do the same for arm::sim objects, since we have the data, and the "posterior" from arm::sim is some kind of posterior including priors, because they used a multivariate normal distribution to get the simulated posterior... ;-)

mattansb commented 5 years ago

the "posterior" from arm::sim is some kind of posterior including priors, because they used a multivariate normal distribution to get the simulated posterior...

Okay, just saw that too in Gelamn & Hall's book (pp 142-143). In that case, the priors aren't even flat, as there are no priors, because they aren't really using posteriors! They use more of a sampling-distribution-type-procedure (similar to bootstrapping, which though similar in practice, is not the same). One might then argue that there isn't anything "Bayesian" about this procedure - as data is treated as random, with a fixed parameter - so it wouldn't really make sense to even think of a prior, as this whole procedure is frequentist in it's core :(

DominiqueMakowski commented 5 years ago

Awesome work and very interesting discussion 👏🙌

Based on what you said, I'd tend to think that it still makes sense to add the BF method, even tho for extremists Bayesians it doesn't really make sense (but what does for them 😁). I have a prior that it could still be practically useful for people ;) As often, I support returning a thoroughly documented result.

In fact, it would be interesting to write a blogpost discussing a "Bayesian approach to frequentist algorithms", such as bootstrapping/simulation.

mattansb commented 5 years ago

I don't think I'm being an extremists Bayesian by saying that a Bayes factor against the null with no prior on the alternative will be definition always support the null... This is exactly one of the (so called) limitations of the Bayes factor - one needs to specify a prior, otherwise - what is being tested? (and since the Bayes factor is the change from the prior to the posterior, the posterior has to be the updated prior - else one is just comparing two unrelated distributions - which may be a valid density ratio test, but it is not a Bayes factor...) But if you guys really want a Bf method that will reflect this 👆, I will add it.

DominiqueMakowski commented 5 years ago

What about a result + a warning?

strengejacke commented 5 years ago

But if you guys really want a Bf method that will reflect this 👆, I will add it.

I think the method is already there, I moved it to the WIP folder:

https://github.com/easystats/bayestestR/blob/master/WIP/bayesfactor_sim.R

mattansb commented 5 years ago

Result and warning is good (: Will try and get to this tomorrow

Welcome back!

-- Mattan S. Ben-Shachar, PhD student Department of Psychology & Zlotowski Center for Neuroscience Ben-Gurion University of the Negev The Developmental ERP Lab

On Sat, Sep 7, 2019, 18:40 Daniel notifications@github.com wrote:

But if you guys really want a Bf method that will reflect this 👆, I will add it.

I think the method is already there, I moved it to the WIP folder:

https://github.com/easystats/bayestestR/blob/master/WIP/bayesfactor_sim.R

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/easystats/bayestestR/issues/219?email_source=notifications&email_token=AINRP6BTPCWXT5KDEUSTDLLQIPDOVA5CNFSM4IPUDZ22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6E3MLY#issuecomment-529118767, or mute the thread https://github.com/notifications/unsubscribe-auth/AINRP6CIOEISIIEJJ6EEGHDQIPDOVANCNFSM4IPUDZ2Q .

mattansb commented 5 years ago

library(arm)
library(lme4)
library(bayestestR)

m <- lmer(mpg ~ wt + gear + (1 | cyl), mtcars)
sim_fit <- arm::sim(m, 1000)

Bfs <- bayesfactor_parameters(sim_fit)
#> Warning: Simulated posteriors cannot be compared to non-existant priors.
#> Bayes factors are by definition equal to zero.
#> Loading required namespace: logspline
Bfs
#> # Bayes Factor (Savage-Dickey density ratio)
#> 
#>    Parameter Bayes Factor
#>  (Intercept)     0.00e+00
#>           wt     0.00e+00
#>         gear     0.00e+00
#> 
#> * Evidence Against The Null: [0]
plot(Bfs)

^{Created on 2019-09-08 by the reprex package (v0.3.0)}

strengejacke commented 5 years ago

@mattansb I'm not sure if this makes sense, since it's not very instructive. What about my suggestion using weakly informative priors? Else, I would indeed say to not add a BF method, we can't pull any information out of it.

mattansb commented 5 years ago

I completely agree that, as is, this makes no sense and no Bayesian information can be gained from such a statistic... As for any weakly informed priors - though technically possible, would be wrong (an in, they would produce a statistic, but it would be wrong to call it a Bayes factor), and so I feel it would be misleading to even allow... The logic is: Bayes factor = posterior / prior. And posterior = prior + data. So for a BF we need that the only difference between the posterior and the prior be the data. Since the posterior here is simulated and not based on any priors, you can't really asses the updating effect of the data (the BF). As I mentioned above, in brms, a build-in savage-dickey Bayes factor function always result in BF = NA when used with flat priors - which is pretty similar to what I'm suggesting here.

I'm okay with leaving it like this, or dropping this method. Your call 😎

strengejacke commented 5 years ago

Yes, I understand your point. I would suggest directly throwing a warning, w/o printing any "result", or what would you both say?

Although the simulation of the "posterior" is based on the coefficients (location) and variance-covariance matrix (scale), so assuming weakly informative priors with location/scale based on the data doesn't seem to be completely nonsense (maybe only a bit nonsense) ;-)

mattansb commented 5 years ago

I'm not saying the samples make no sense, just that the are not real posteriors - do not reflect updated priors, and thus the "updating" cannot be quantified.

throwing a warning, w/o printing any "result"

So, an error? 😜 I think we're converging on "remove the method"? If anyone ever asks why, we can point them to this lengthy conversation 😂

strengejacke commented 5 years ago

Still, when you fit a model in rstanarm, the functions automatically calculate "weakly informative priors", which are centered around location 0 with scale based on the predictor's variance (exactly what I did with the original data from model that was processed in the simulations). So we literally started from "no priors" as well, it's just that rstanarm created weakly informative priors.

My problem here is, that what I have done here feels at least "technically" similar, because we started from weakly informative priors and got the simulated draws ("posterior").

mattansb commented 5 years ago

...because we started from weakly informative priors and got the simulated draws ("posterior").

This is not true - the posteriors are unrelated to the priors - you don't start here with the priors and end with the posteriors, but instead you make prior and posterior draws in an unrelated manner. I.e., the posteriors are not the results of the priors, and that makes all the difference!

So to reiterate, the problem isn't with the priors you suggest, or with the posteriors made by sim, the problem is that they are unrelated.

What rstanarm does:

If not priors, make some weak priors.
Make posteriors by updating the priors with the data.
- A density ratio can be interpreted as a Bayes factor reflecting the updating process.

What you're suggesting:

Make some weak priors.
Make posteriors (by simulating the results from freq models) that are unrelated to the priors.
- A density ratio can not be interpreted as a Bayes factor, as the posteriors are not "updated" versions of the priors. If there wasn't any updating, the degree of updating cannot be quantified.

This isn't just some technical note, or me being unnecessarily stubborn, this is exactly the essence of the Bayes factor - this is what it measures. If someone wants to get BFs from freq models for comparing models, we have a different functions for that 😎

strengejacke commented 5 years ago

Make posteriors (by simulating the results from freq models) that are unrelated to the priors.

True, it's just that "weakly informative priors" based on the data frame or simulating from results from freq model are in so far related, as freq models assume "null" hypothesis, i.e. something centered around zero.

But I think, and agree with you, if it is statistically not sound, we should not support such things. So ...

I think we're converging on "remove the method"

mattansb commented 5 years ago

Then I shall remove the methods 😃 Can this issue be closed? 🤔

strengejacke commented 5 years ago

The initial issue was to have hdi() etc. for sim-objects, which has been implemented. So yes, we can close this now, I guess.

DominiqueMakowski commented 5 years ago

Do not remove it entirely but throw informative error explaining why it doesn't make sense "Bayes factors are based on the shift from a prior to a posterior. As simulated draws are unrelated to any priors, computing Bayes factors does not make sense"

easystats / bayestestR

lmer and arm::sim instead of stan_glmer on large data? #219