easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
380 stars 38 forks source link

`find_predictors()`: false positives with some user inputs in GAM splines #851

Closed vincentarelbundock closed 4 months ago

vincentarelbundock commented 4 months ago

I fit a GAM model with spline defined by the s() with a k argument. When k is a number, find_predictors() works fine. When k is a variable (storing a number), find_predictors() erroneously returns that variable name as a predictor.

I anticipate that this might be pretty tricky to disambiguate using regular expressions. In this example, the correct answer is: x, z. We do not want to get w, as in the second example.

Is there a relatively clean way to fix this?

library(mgcv)
set.seed(123)
n = 500
xn = rep(c(1, 2, 3), n)
levels = sort(unique(xn))
labels = c("low", "med", "high")
x = factor(xn, levels = levels, labels = labels)
z = sample(c(1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 7, 7), size = length(x), replace = TRUE)
y.raw = xn * z
e = rnorm(length(x), sd = sd(y.raw))
y = y.raw + e
data = data.frame(x, y, z)

w = 3
m1 = try(gam(y ~ s(z, by = x, k = 3) + x, data = data), silent = TRUE)
m2 = try(gam(y ~ s(z, by = x, k = w) + x, data = data), silent = TRUE)

insight::find_predictors(m1)
# $conditional
# [1] "z" "x"

insight::find_predictors(m2)
# $conditional
# [1] "z" "x" "w"

Initially reported by @urisohn here: https://github.com/vincentarelbundock/marginaleffects/issues/1031

strengejacke commented 4 months ago

Thanks! The regex in #852 should work for this example, however, there may be more exceptions or edge cases for s().

vincentarelbundock commented 4 months ago

Thanks a lot, I appreciate it!