easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
391 stars 39 forks source link

get_predicted(): features and caveats #315

Open DominiqueMakowski opened 3 years ago

DominiqueMakowski commented 3 years ago

Here's a list of all features / issues etc., to have a global view and avoid opening multiple issues.

IndrajeetPatil commented 3 years ago

Do you think this function should ultimately be assigned to its own package?

A package like prediction, predicted, etc. (not sure which names are available on CRAN).

I would also recommend having an alias for this function called model_predictions.

This would mean the ecosystem would have a consistent function naming schemas and also mirror functions for broom functions:

broom::tidy     <->  parameters::model_parameters
broom::glance   <->  performance::model_performance
broom::augment  <->  prediction::model_prediction

What do you think? cc @strengejacke, @mattansb

strengejacke commented 3 years ago

Well, we have the modelbased package. I think @DominiqueMakowski just wanted to put the workhorse into insight, and then the "visible" user function will be in modelbased.

IndrajeetPatil commented 3 years ago

the "visible" user function will be in modelbased

I see, gotcha!

What do you think about calling this function model_prediction?

DominiqueMakowski commented 3 years ago

Yes correct, the idea was to have the heavy lifting done at the insight level, mostly cos it is useful elsewhere (performance indices for instance or such). Then, modelbased will be a user-facing package focusing mostly on visualization of models (including visualization of links, contrasts, marginal means etc). I plan to add nice vignettes to modelbased dicsussing a model-based approach to stats, how to make inference and derive indices from models etc.

the function are currently named estimate_link and estimate_response but given the new API of get_predicted I might change to something like estimate_relationship() and estimate_response() or something like this

strengejacke commented 3 years ago

But we would still use both emmeans and predict, right?

DominiqueMakowski commented 3 years ago

yes yes for estimate_means and estimate_contrasts as the internals of that are definitly beyond us

strengejacke commented 3 years ago

Not as option for estimate_response? See https://strengejacke.github.io/ggeffects/articles/technical_differencepredictemmeans.html

DominiqueMakowski commented 3 years ago

mmh we'll see, but I feel like for modelbased we could really merge your and indra's experience with model visualization to make something really neat and powerful.

strengejacke commented 3 years ago

Support for other models, here I list some special cases:

strengejacke commented 3 years ago

"clmm" is only supported by emmeans, has not predict.

mattansb commented 3 years ago

I've added prediction intervals for Binomial and Poisson models:

library(insight)
library(magrittr)

glm(am ~ cyl + mpg + hp, data = mtcars,
    family = binomial()) %>% 
  get_predicted(predict = "prediction") %>% 
  as.data.frame() %>% head() %>% zapsmall()
#>                   Predicted CI_low CI_high
#> Mazda RX4         0.1762632      0       1
#> Mazda RX4 Wag     0.1762632      0       1
#> Datsun 710        0.6499998      0       1
#> Hornet 4 Drive    0.2489592      0       1
#> Hornet Sportabout 0.2268359      0       1
#> Valiant           0.0064952      0       0

glm(gear ~ ., data = mtcars,
    family = poisson()) %>% 
  get_predicted(predict = "prediction") %>% 
  as.data.frame() %>% head() %>% zapsmall()
#>                   Predicted CI_low CI_high
#> Mazda RX4          4.266709      1       9
#> Mazda RX4 Wag      4.192287      1       9
#> Datsun 710         4.031290      1       8
#> Hornet 4 Drive     3.043687      0       7
#> Hornet Sportabout  2.967043      0       7
#> Valiant            2.890863      0       7

Created on 2021-03-10 by the reprex package (v1.0.0)

The Gaussian PIs still work as they use to (:

As I noted somewhere prior, each family has its own PI method, so adding each these analytical solutions is somewhat tedious... It might be easier to simulate population values and get the 95% ETI from there instead.


Also, these columns should really be PI_low and PI_high...

DominiqueMakowski commented 3 years ago

@mattansb I hope you found the code clear and logical :)

One thing to make sure of:

Following your comments, currently the PIs for Bayesian models are obtained via their bespoke function:

https://github.com/easystats/insight/blob/da33355faf3868616736499f793377e88e2d8d30/R/get_predicted_ci.R#L115

instead of manually computing them from the draws. But does their function does something else than that? Or is it unnecessarily re-sampling draws from the posterior? (note that the iterations were obtained via posterior_predict).

mattansb commented 3 years ago

instead of manually computing them from the draws. But does their function does something else than that?

Nope - that's exactly what it does:

https://github.com/stan-dev/rstanarm/blob/dee0a2d45bf42b2df791072041151b753edd6af9/R/predictive_interval.R#L82-L91

DominiqueMakowski commented 3 years ago

thanks for addressing my source-code-checking laziness 😁 - I'll drop that exception then

mattansb commented 3 years ago

@strengejacke @mattansb is it possible to get PIs for GAMs? Can we use the same method that for lm/glms?

The code I wrote should also work with GLM GAMs. Is there any reason the same won't apply for gaussian models?

strengejacke commented 3 years ago

@DominiqueMakowski Currently, some methods like get_predicted.glmmTMB() directly check the predict argument, while others like get_predicted.lmerMod() call .get_predicted_args() to check the input. This leads to the situation that we have methods that have predict = c("expectation", "link", "prediction", "response", "relation") in their usage, while others just have predict = "expectation". Can this be harmonized?

strengejacke commented 3 years ago

See https://github.com/easystats/insight/commit/4aa8e44bb22980857a4ec24e47a984b5367da4e2, else https://github.com/easystats/insight/blob/4aa8e44bb22980857a4ec24e47a984b5367da4e2/tests/testthat/test-get_predicted.R#L150 fails

bwiernik commented 3 years ago

Prediction intervals for glmmTMB: Also, why are confidence intervals so different in respect to lme4 (could be related to issue above)

Note that by default, glmmTMB uses Wald CIs on fixed effects/log variance components, whereas lme4 uses profile likelihood CIs on fixed effects/variance components