draw.gam errors with a valid HGAM and `grouped_by = TRUE` & `parametric = TRUE`

gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package

Other

206 stars 28 forks source link

pkgs <- c("mgcv", "lme4", "ggplot2", "readr", "dplyr", "forcats", "tidyr", "gratia") # load data vapply(pkgs, library, logical(1), character.only = TRUE, logical.return = TRUE, quietly = TRUE) rats_url <- "https://bit.ly/rat-hormone" rats <- read_table(rats_url, col_types = "dddddddddddd-") # ignore the warning - it"s due to trailing white space at the ends of each # row in the file rats <- rats %>% mutate(treatment = fct_recode(factor(group, levels = c(1, 2, 3)), Low = "1", High = "2", Control = "3"), treatment = fct_relevel(treatment, c("Control", "Low", "High")), subject = factor(subject))

m3_hgam <- gam(response ~ treatment + s(time, by = treatment, k = K) + s(subject, bs = "re"), data = rats, method = "REML") draw(m3_hgam, residuals = TRUE, rug = FALSE, grouped_by = TRUE, parametric = TRUE)

> draw(m3_hgam, residuals = TRUE, rug = FALSE, grouped_by = TRUE, parametric = TRUE) Error in `map()`: ℹ In index: 1. Caused by error in `bind_cols()`: ! Can't recycle `level` (size 350) to match `partial` (size 252). Run `rlang::last_error()` to see where the error occurred.

The issue here is that the data contains NAs and hence the way parametric_effects() works to reconstruct the fitting data (along the lines of how termplot() does it) produces original data that contains the NA values. However, because the default in mgcv is to use na.action = na.omit, we have the observed incompatibility between the sizes of the variables being combined. If we fit the model with na.action = na.exclude, the issue goes away.

Options to consider to address this:

could run na.omit or complete.cases on the reconstructed data before predict()?

For factor parametric effects this is fine, but what about continuous terms? Are the reconstructed data the full thing passed to the data argument? In which case we only want to run those on the variables used in fitting the model.
could run distinct() on the reconstructed data and pass that to predict()?
could provide a better error message to suggest the source of the problem and suggest refitting with na.action = na.exclude?

gavinsimpson / gratia

draw.gam errors with a valid HGAM and `grouped_by = TRUE` & `parametric = TRUE` #219