gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

`draw()` fails if `parametric = TRUE` when a `bam()` has a `subset` specified and `discrete = TRUE` #310

Open StefanoMezzini opened 3 months ago

StefanoMezzini commented 3 months ago

Hi Gavin, thank you for actively maintaining {gratia}! draw() fails with an error when I I try to plot a model with parametric = TRUE and I fit the model with bam(..., method = 'fREML', discrete = TRUE) and I subset the data within bam(). A reprex is below.

library('mgcv')    # for GAMs
library('gratia')  # for ggplot-based model plots

d <- data.frame(x = rnorm(100),
                f1 = factor(sample(1:2, size = 100, replace = TRUE)),
                f2 = factor(sample(paste0('T_', 1:4), size = 100, replace = TRUE)),
                y = rnorm(100))

# fits ok with full dataset
m <- bam(
  y ~
    f1 +
    s(x, by = f1, k = 10, bs = 'tp') +
    s(x, f2, k = 10, bs = 'fs', xt = list(bs = 'cr')),
  data = d,
  method = 'fREML',
  discrete = TRUE)

draw(m)

# fails with specified subset and discrete = TRUE
m <- bam(
  y ~
    f1 +
    s(x, by = f1, k = 10, bs = 'tp') +
    s(x, f2, k = 10, bs = 'fs', xt = list(bs = 'cr')),
  data = d,
  method = 'fREML',
  subset = f2 != 'T_1',
  discrete = TRUE)

draw(m, parametric = TRUE)

# succeeds if discrete = FALSE
m <- bam(
  y ~
    f1 +
    s(x, by = f1, k = 10, bs = 'tp') +
    s(x, f2, k = 10, bs = 'fs', xt = list(bs = 'cr')),
  data = d,
  method = 'fREML',
  subset = f2 != 'T_1')

draw(m, parametric = TRUE)

# succeeds if parametric = FALSE
m <- bam(
  y ~
    f1 +
    s(x, by = f1, k = 10, bs = 'tp') +
    s(x, f2, k = 10, bs = 'fs', xt = list(bs = 'cr')),
  data = d,
  method = 'fREML',
  subset = f2 != 'T_1')

draw(m)
gavinsimpson commented 3 months ago

Thanks for the report @StefanoMezzini!

The error comes from parametric_effects() in this line:

data <- model.frame(object$pred.formula, data = data)

where I'm not applying the subset from the original model fit.

gavinsimpson commented 3 months ago

Fixing this should involve the standard non-standard evaluation ideas of modifying the original call (stored in m$call) to be a call to model.frame() instead of bam(), replacing the used formula with m$pred.formula and the data, subset, drop.unused.levels, and na.action arguments (I think that would be sufficient here).