Open bwiernik opened 2 years ago
This design is somewhat a bit at odds with our traditional opinionated API, rather than having different plot_types, I'd just pick one version which we think it's the best and stick with it
Agreed.
If you'e interested in implementing DHARMa's approach, you could do something like this:
library(glmmTMB)
library(performance)
#' Check uniformity of GL(M)M's residuals
#'
#' `check_uniformity()` checks generalized linear (mixed) models for uniformity
#' of randomized quantile residuals, which can be used to identify typical model
#' misspecification problems, such as over/underdispersion, zero-inflation, and
#' residual spatial and temporal autocorrelation.
#'
#' @param object Fitted model.
#'
#' @details
#'
#' See `vignette("DHARMa")`
#'
#' @references
#'
#' - Hartig, F., & Lohse, L. (2022). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models (Version 0.4.5). Retrieved from https://CRAN.R-project.org/package=DHARMa
#' - Dunn, P. K., & Smyth, G. K. (1996). Randomized Quantile Residuals. Journal of Computational and Graphical Statistics, 5(3), 236. https://doi.org/10.2307/1390802
#'
#' @return ggplot.
check_uniformity <- function(object) {
# Simulated residuals; see vignette("DHARMa")
simulated_residuals <- DHARMa::simulateResiduals(object)
dp <- list(min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
ggplot2::ggplot(
tibble::tibble(scaled_residuals = residuals(simulated_residuals)),
ggplot2::aes(sample = scaled_residuals)
) +
qqplotr::stat_qq_band(distribution = "unif", dparams = list(min = 0, max = 1), alpha = .2) +
qqplotr::stat_qq_line(distribution = "unif", dparams = dp, size = .8, colour = "#3aaf85") +
qqplotr::stat_qq_point(distribution = "unif", dparams = dp, size = .5, alpha = .8, colour = "#1b6ca8") +
ggplot2::labs(
title = "Uniformity of Residuals",
subtitle = "Dots should fall along the line",
x = "Standard Uniform Distribution Quantiles",
y = "Sample Quantiles"
) +
see::theme_lucid()
}
data("Salamanders")
m <- glmmTMB(
count ~ mined + spp + (1 | site),
family = poisson,
data = Salamanders
)
check_uniformity(m)
Created on 2022-06-17 by the reprex package (v2.0.1)
@mccarthy-m-g suggestion looks rather easy to implement.
@strengejacke What would a PR for this involve? I could get a draft started if this is the solution you want to go for.
Let's call the function check_residuals()
@mccarthy-m-g You can add the function to a new checkresiduals.R file here in the performance package. Take a look at one of the other check functions like check_normality.R for an example of the documentation syntax and structure.
Then open a PR here and we can merge it in. After that, then we can move over to the see package repo and add the plotting function there.
Hi all, I just opened a new issue to discuss the implementation for check_residuals()
(#595). There are a few things that should be resolved before getting a PR started.
This is the current development stage. We see a mismatch between the tests based on simulated residuals and generated plots for following families/models:
library(performance)
library(glmmTMB)
library(readr)
docvisit <- read_table2("C:/Users/Daniel/Downloads/docvisit.txt")
mp <- glmmTMB(
doctorco ~ sex + illness + income + hscore,
data = docvisit,
family = poisson()
)
out <- check_overdispersion(mp)
out
#> # Overdispersion test
#>
#> dispersion ratio = 1.808
#> Pearson's Chi-Squared = 9375.539
#> p-value = < 0.001
#> Overdispersion detected.
plot(out)
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
mnb <- glmmTMB(
doctorco ~ sex + illness + income + hscore,
data = docvisit,
family = nbinom2()
)
out <- check_overdispersion(mnb)
out
#> # Overdispersion test
#>
#> dispersion ratio = 1.005
#> p-value = 0.816
#> No overdispersion detected.
plot(out)
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
mzip <- glmmTMB(
doctorco ~ sex + illness + income + hscore,
ziformula = ~ age,
data = docvisit,
family = poisson()
)
out <- check_overdispersion(mzip)
out
#> # Overdispersion test
#>
#> dispersion ratio = 1.417
#> p-value = < 0.001
#> Overdispersion detected.
plot(out)
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
mzinb <- glmmTMB(
doctorco ~ sex + illness + income + hscore,
ziformula = ~ age,
data = docvisit,
family = nbinom2()
)
out <- check_overdispersion(mzinb)
out
#> # Overdispersion test
#>
#> dispersion ratio = 1.031
#> p-value = 0.64
#> No overdispersion detected.
plot(out)
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
mzinbd <- glmmTMB(
doctorco ~ sex + illness + income + hscore + age,
ziformula = ~ sex + illness + income + hscore + age,
dispformula = ~ sex + illness + income + hscore + age,
data = docvisit,
family = nbinom2()
)
out <- check_overdispersion(mzinbd)
out
#> # Overdispersion test
#>
#> dispersion ratio = 1.133
#> p-value = 0.104
#> No overdispersion detected.
plot(out)
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Created on 2024-03-17 with reprex v2.1.0
Looks like "nbinom2()" is currently inaccurate, the code we use is here: https://github.com/easystats/performance/blob/35b5e19988386b584d91116be542baca1e98f33f/R/check_model_diagnostics.R#L370
(also pinging @bbolker and cross referencing to #654)
The current selection of plots returned by
check_model()
for GLMs aren't ideal in a few ways.1. They are missing a linearity check (fitted vs residuals). For binomial models, this should be a called tobinned_residuals()
. For other families, the standard check is fine.2. For binomial models, the constant variance plot should be omitted.4. For non-bernoulli models, we should include a plot for checking overdispersion.For the latter few points, the DHARMa package provides an easy-to-interpret approach for checking distributional assumptions from qq plots and problems with fitted vs residual plots using quantile residuals. We might consider soft-importing DHARMa or re-implementing those approaches. https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html https://github.com/florianhartig/DHARMa/issues/33