easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
575 stars 55 forks source link

Model selection and parameter estimation #361

Closed nahorp closed 3 years ago

nahorp commented 3 years ago

Hello,

I didn't know where to post this but you all have always been kind enough to guide me in the right direction so thought I'd post here.

I come from a field (cognitive neuroscience) where we don't do a lot of mixed-effect models and those too of the Bayesian type. I think Bayesian mixed-models are quite intuitive and am doing my analyses using these with brms and bayestestR mostly. I am currently struggling with model selection and, more specifically, the innumerable models you can have the moment you have 4-5 experimental factors. Going through model selection, using whichever information criterion (I am using LOO), with those many main and interaction effects takes a lot of time.

I guess what I am asking is, can I fit the maximal model, estimate my parameters, and then base my inferences on the parameters (using the HDI + ROPE rule for example). Given that my predictor variables are experimental manipulations I am interested in makes me feel like I have a rationale to do this (i.e., run a maximal model and base inferences on the parameters). Do I need to go through the process of model selection?

Any links/papers would be much appreciated. Thank you :)

strengejacke commented 3 years ago

Any reason for closing this?

nahorp commented 3 years ago

Not particularly, no - still want links/papers (or anything that would help) but just not particularly sure this might be the right forum for it (stackexchange/crossvalidated came to mind)

DominiqueMakowski commented 3 years ago

If I understand correctly, useful functions could be:

But in general, model selection is first and foremost a theoretical step, and only then a statistical one. In other words, you should keep the models that makes sense regarding your hypothesis/theory, and then statistical model selection can be used to provide evidence for your decisions and/or narrow down some of the choices.

strengejacke commented 3 years ago

Do I need to go through the process of model selection?

As far as I understood, you vary different experimental factors, so your approach is less theoretically driven, but more an "explorative" approach of model selection, because you cannot derive from theory/hypotheses, which factors are most appropriate? If so, model selection is probably a good strategy. If I'm wrong with my guess about your design, I always prefer building a ("maximal") model based on theoretical aspects.

In case you have a "maximal" model, considerations according to multicollinearity, collider bias, direct/indirect effects (mediation) etc. might become relevant, which then both affects the way how you specify your model (and which predictors to drop) and how to interpret the results.

nahorp commented 3 years ago

Hhmmm the way I view it is that I vary only those different experimental factors because I have a theoretical interest in them so feel quite justified in the "maximal" model. Is there an easystats package or blogpost, the direction of which I could be pointed in, to explore those additional considerations @strengejacke mentions? Thank you!

IndrajeetPatil commented 3 years ago

This is a good read for learning about maximal (mixed-effects) models: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3881361/

As for the other issues Daniel has mentioned, a few things you can handle via easystats (e.g., https://easystats.github.io/blog/posts/performance_check_collinearity/, https://easystats.github.io/bayestestR/articles/mediation.html), but not sure about others (e.g., collider bias).

nahorp commented 3 years ago

Sweet, thanks for that @IndrajeetPatil!

strengejacke commented 3 years ago

See also ?lme4::isSingular or the details section here: https://easystats.github.io/performance/reference/check_singularity.html - there you can see that there are also completely opposite suggestions that fitting the maximal model.

For me, I rarely use model selection techniques based on AIC or similar, but rather choose the model depending on theoretical reasoning. I maybe fit a handful of competing models to see which fits without problems (this usually is not necessary for Bayesian, because you have no convergence or singularity issues), and then compare and test those models, but I don't fit a maximal model or include all predictors and use some sort of forward or backward selection.

IndrajeetPatil commented 3 years ago

Closing for now. Let us know if you have any further questions.