Open DominiqueMakowski opened 3 years ago
Do you think this function should ultimately be assigned to its own package?
A package like prediction
, predicted
, etc. (not sure which names are available on CRAN
).
I would also recommend having an alias for this function called model_predictions
.
This would mean the ecosystem would have a consistent function naming schemas and also mirror functions for broom
functions:
broom::tidy <-> parameters::model_parameters
broom::glance <-> performance::model_performance
broom::augment <-> prediction::model_prediction
What do you think? cc @strengejacke, @mattansb
Well, we have the modelbased package. I think @DominiqueMakowski just wanted to put the workhorse into insight, and then the "visible" user function will be in modelbased.
the "visible" user function will be in modelbased
I see, gotcha!
What do you think about calling this function model_prediction
?
Yes correct, the idea was to have the heavy lifting done at the insight level, mostly cos it is useful elsewhere (performance indices for instance or such). Then, modelbased will be a user-facing package focusing mostly on visualization of models (including visualization of links, contrasts, marginal means etc). I plan to add nice vignettes to modelbased dicsussing a model-based approach to stats, how to make inference and derive indices from models etc.
the function are currently named estimate_link
and estimate_response
but given the new API of get_predicted I might change to something like estimate_relationship()
and estimate_response()
or something like this
But we would still use both emmeans and predict, right?
yes yes for estimate_means
and estimate_contrasts
as the internals of that are definitly beyond us
Not as option for estimate_response? See https://strengejacke.github.io/ggeffects/articles/technical_differencepredictemmeans.html
mmh we'll see, but I feel like for modelbased we could really merge your and indra's experience with model visualization to make something really neat and powerful.
Support for other models, here I list some special cases:
"clmm" is only supported by emmeans, has not predict.
I've added prediction intervals for Binomial and Poisson models:
library(insight)
library(magrittr)
glm(am ~ cyl + mpg + hp, data = mtcars,
family = binomial()) %>%
get_predicted(predict = "prediction") %>%
as.data.frame() %>% head() %>% zapsmall()
#> Predicted CI_low CI_high
#> Mazda RX4 0.1762632 0 1
#> Mazda RX4 Wag 0.1762632 0 1
#> Datsun 710 0.6499998 0 1
#> Hornet 4 Drive 0.2489592 0 1
#> Hornet Sportabout 0.2268359 0 1
#> Valiant 0.0064952 0 0
glm(gear ~ ., data = mtcars,
family = poisson()) %>%
get_predicted(predict = "prediction") %>%
as.data.frame() %>% head() %>% zapsmall()
#> Predicted CI_low CI_high
#> Mazda RX4 4.266709 1 9
#> Mazda RX4 Wag 4.192287 1 9
#> Datsun 710 4.031290 1 8
#> Hornet 4 Drive 3.043687 0 7
#> Hornet Sportabout 2.967043 0 7
#> Valiant 2.890863 0 7
Created on 2021-03-10 by the reprex package (v1.0.0)
The Gaussian PIs still work as they use to (:
As I noted somewhere prior, each family has its own PI method, so adding each these analytical solutions is somewhat tedious... It might be easier to simulate population values and get the 95% ETI from there instead.
Also, these columns should really be PI_low
and PI_high
...
@mattansb I hope you found the code clear and logical :)
One thing to make sure of:
Following your comments, currently the PIs for Bayesian models are obtained via their bespoke function:
instead of manually computing them from the draws. But does their function does something else than that? Or is it unnecessarily re-sampling draws from the posterior? (note that the iterations were obtained via posterior_predict).
instead of manually computing them from the draws. But does their function does something else than that?
Nope - that's exactly what it does:
thanks for addressing my source-code-checking laziness 😁 - I'll drop that exception then
@strengejacke @mattansb is it possible to get PIs for GAMs? Can we use the same method that for lm/glms?
The code I wrote should also work with GLM GAMs. Is there any reason the same won't apply for gaussian models?
@DominiqueMakowski Currently, some methods like get_predicted.glmmTMB()
directly check the predict
argument, while others like get_predicted.lmerMod()
call .get_predicted_args()
to check the input. This leads to the situation that we have methods that have predict = c("expectation", "link", "prediction", "response", "relation")
in their usage, while others just have predict = "expectation"
. Can this be harmonized?
Prediction intervals for glmmTMB: Also, why are confidence intervals so different in respect to lme4 (could be related to issue above)
Note that by default, glmmTMB uses Wald CIs on fixed effects/log variance components, whereas lme4 uses profile likelihood CIs on fixed effects/variance components
Here's a list of all features / issues etc., to have a global view and avoid opening multiple issues.
data
argument, when holding constant non-focal co-variates (https://github.com/easystats/insight/issues/316)predict = "observation"
, transformed output to match shape of observations (i.e., Zeros and Ones for logistic)