gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

draw.gam errors with a valid HGAM and `grouped_by = TRUE` & `parametric = TRUE` #219

Closed gavinsimpson closed 1 year ago

gavinsimpson commented 1 year ago

From the rate hormone example in my GAM course, m3_hgam fails:

pkgs <- c("mgcv", "lme4", "ggplot2", "readr", "dplyr", "forcats", "tidyr",
          "gratia")

# load data
vapply(pkgs, library, logical(1), character.only = TRUE, logical.return = TRUE,
       quietly = TRUE)

rats_url <- "https://bit.ly/rat-hormone"
rats <- read_table(rats_url, col_types = "dddddddddddd-")
# ignore the warning - it"s due to trailing white space at the ends of each
#   row in the file

rats <- rats %>%
    mutate(treatment = fct_recode(factor(group, levels = c(1, 2, 3)),
                                  Low = "1",
                                  High = "2",
                                  Control = "3"),
           treatment = fct_relevel(treatment, c("Control", "Low", "High")),
           subject = factor(subject))

draw() throws an error with:

m3_hgam <- gam(response ~ treatment +
                 s(time, by = treatment, k = K) +
                 s(subject, bs = "re"),
               data = rats, method = "REML")
draw(m3_hgam, residuals = TRUE, rug = FALSE, grouped_by = TRUE, parametric = TRUE)

The error is

> draw(m3_hgam, residuals = TRUE, rug = FALSE, grouped_by = TRUE, parametric = TRUE)
Error in `map()`:
ℹ In index: 1.
Caused by error in `bind_cols()`:
! Can't recycle `level` (size 350) to match `partial` (size 252).
Run `rlang::last_error()` to see where the error occurred.
gavinsimpson commented 1 year ago

The issue here is that the data contains NAs and hence the way parametric_effects() works to reconstruct the fitting data (along the lines of how termplot() does it) produces original data that contains the NA values. However, because the default in mgcv is to use na.action = na.omit, we have the observed incompatibility between the sizes of the variables being combined. If we fit the model with na.action = na.exclude, the issue goes away.

Options to consider to address this:

  1. could run na.omit or complete.cases on the reconstructed data before predict()?

    For factor parametric effects this is fine, but what about continuous terms? Are the reconstructed data the full thing passed to the data argument? In which case we only want to run those on the variables used in fitting the model.

  2. could run distinct() on the reconstructed data and pass that to predict()?

  3. could provide a better error message to suggest the source of the problem and suggest refitting with na.action = na.exclude?