gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

Error in `vec_rbind` in `difference_smooths()` when comparing smooths in GAM model with a factor-by interaction term #315

Open 3rd3 opened 2 months ago

3rd3 commented 2 months ago

Description:

I encountered an issue while trying to run a significance test between two fitted surfaces using difference_smooths(). The test involves a GAM fit using a simple factor-by interaction smooth (Z ~ s(X, Y, bs = "tp", by = group)) with two groups. The error suggests a size mismatch in the vec_rbind() function from the vctrs package, likely during the call to dplyr::bind_rows().

Additional question: Are there other ways of running such a significance test? I would also like to try qgam.

Code to Reproduce:

library(qgam)
library(gratia)

set.seed(123)  # For reproducibility
grid_X <- seq(-3, 3, length.out = 50)
grid_Y <- seq(-3, 3, length.out = 50)
data_grid <- expand.grid(X = grid_X, Y = grid_Y)
data_grid$Z <- with(data_grid, X^2 + Y^2 + rnorm(length(X), sd = 0.5))
data_grid$group <- sample(c(0, 1), size = nrow(data_grid), replace = TRUE)
data_grid$group <- as.factor(data_grid$group)
fit <- gam(Z ~ te(X, Y, by = group, bs = "tp"), data = data_grid)
sm_dif <- difference_smooths(fit, select = c("te(X,Y):group0", "te(X,Y):group1"), partial_match = FALSE)

Error Message:

Error in `vec_rbind()`:
! `value` (size 20000) doesn't match `x` (size 10000).
ℹ In file slice-assign.c at line 313.
ℹ Install the winch package to get additional debugging info the next time you get this error.
ℹ This is an internal error that was detected in the vctrs package.
  Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
Backtrace:
     ▆
  1. ├─base::source(...)
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │   └─base::eval(ei, envir)
  5. ├─gratia::difference_smooths(...) at c:\Users\me\Code\R-test-plot.R:25:1
  6. ├─gratia:::difference_smooths.gam(...)
  7. │ └─dplyr::bind_rows(out)
  8. │   └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
  9. └─rlang:::stop_internal_c_lib(...)
 10.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)
Warning messages:
1: In grepl(mgcv_by_smooth_labels(select, by_var, f1), cnames, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grepl(mgcv_by_smooth_labels(select, by_var, f2), cnames, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

Additionally, the following warning messages appear:

Warning messages:
1: In grepl(mgcv_by_smooth_labels(select, by_var, f1), cnames, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grepl(mgcv_by_smooth_labels(select, by_var, f2), cnames, fixed = TRUE) :
  argument 'pattern' has length > 1 and only the first element will be used

Session Info:

sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8    

time zone: America/Detroit
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] gratia_0.9.2  plotly_4.10.4 ggplot2_3.5.1 rgl_1.3.1     qgam_1.3.4
[6] mgcv_1.9-1    nlme_3.1-164  MASS_7.3-60.2

loaded via a namespace (and not attached):
 [1] tidyr_1.3.1       utf8_1.2.4        generics_0.1.3    stringi_1.8.4
 [5] lattice_0.22-6    digest_0.6.37     magrittr_2.0.3    grid_4.4.1       
 [9] iterators_1.0.14  fastmap_1.2.0     foreach_1.5.2     doParallel_1.0.17
[13] plyr_1.8.9        jsonlite_1.8.8    Matrix_1.7-0      promises_1.3.0
[17] httr_1.4.7        ggokabeito_0.1.0  purrr_1.0.2       fansi_1.0.6
[21] crosstalk_1.2.1   viridisLite_0.4.2 scales_1.3.0      codetools_0.2-20
[25] lazyeval_0.2.2    cli_3.6.3         shiny_1.9.1       rlang_1.1.4      
[29] munsell_0.5.1     splines_4.4.1     yaml_2.3.10       base64enc_0.1-3
[33] withr_3.0.1       tools_4.4.1       parallel_4.4.1    dplyr_1.1.4
[37] colorspace_2.1-1  httpuv_1.6.15     vctrs_0.6.5       R6_2.5.1
[41] mime_0.12         lifecycle_1.0.4   stringr_1.5.1     htmlwidgets_1.6.4
[45] pkgconfig_2.0.3   pillar_1.9.0      later_1.3.2       gtable_0.3.5
[49] glue_1.7.0        data.table_1.16.0 Rcpp_1.0.13       xfun_0.47
[53] tibble_3.2.1      tidyselect_1.2.1  knitr_1.48        farver_2.1.2     
[57] xtable_1.8-4      patchwork_1.2.0   htmltools_0.5.8.1 compiler_4.4.1
[61] mvnfast_0.2.8

Thank you for your attention!

gavinsimpson commented 2 months ago

This is because you are being too specific about the smooths that you want to difference. The function only works for factor by smooths (at the moment), so there is no need to specify the the specific smooth:level combination. All you need is:

difference_smooths(fit, select = "te(X,Y)")

for your example.

I should try to catch this issue and do something more user-friendly than the current vctrs error.

gavinsimpson commented 2 months ago

As for qgam() models; while I haven't tried it, I don't recall anything specific that would stop difference_smooths() from working unless the model you fitted was for multiple $\tau$ values - qgam() models inherit from class "gam" so I would expect difference_smooths() to just work. Reality might be a bit different though, so if it doesn't work let me know and I'll make it work.

gavinsimpson commented 2 weeks ago

Right now, no; I haven't implemented simultaneous intervals for differences, but I could do this.