gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

draw() returning error #245

Closed sealavi closed 8 months ago

sealavi commented 11 months ago

Recently I have tried several models and consistently get the following error whenever attempting to use draw()

Error in Ops.data.frame(guide_loc, panel_loc) :

‘==’ only defined for equally-sized data frames

This is an example of a recent model I attempted to plot using draw()

model1 = bam(Hour ~ s(call_number, bs = "bs", k = 16, m = c(3,2,1)) + s(time_since_create, bs = "bs", k = 20, m = c(3,2,1))+

               ti(call_number,time_since_create, bs = "bs", k = c(16,20))+

             s(employee_id, bs = "re") +s(employee_id,call_number, bs = "re")+s(employee_id,time_since_create, bs = "re"),

             data = newdat2, method = "fREML",nthreads=2,discrete=TRUE)

I am running R version 4.3.2 on Windows 10 and am using gratia 0.8.1.46

Any assistance will be much appreciated.

gavinsimpson commented 11 months ago

Without a working reproducible example it's hard to say anything specific. Can you use the select argument to narrow this down to any particular smooth? To use this you'll need to know the mgcv labels for the smooths, which you can get with smooths(), so you could try

draw(model, select = smooths(model)[1], ...)

increasing the index [1] sequentially until you find the smooth causing the issue.

gavinsimpson commented 11 months ago

That said, (and although it doesn't matter form the point of view of something failing with an obscure error) I don't understand why you are using the s(employee_id, call_number, bs = "re") term to add a linear random effect of call_number, when you are also treating the effect of call_number as smooth? If you want a random smooth, use the fs basis. Ditto for the other linear random effect that also occurs in a smooth.

sealavi commented 11 months ago

Without a working reproducible example it's hard to say anything specific. Can you use the select argument to narrow this down to any particular smooth? To use this you'll need to know the mgcv labels for the smooths, which you can get with smooths(), so you could try

draw(model, select = smooths(model)[1], ...)

increasing the index [1] sequentially until you find the smooth causing the issue.

Unfortunately I'm unable to share actual data or a model object. When trying draw() in each individual smooth the error happens with all smooths.

I tried to simulate as realistic an example of the data as possible. Hopefully this will help

library(dplyr)

# Set the number of data points

n <- 5000

# Parameters for Gaussian Mixture Model for 'Hour'

pi <- c(0.3396436, 0.6603564) # Proportions of each component

mu <- c(10.4975361, 15.0670634) # Means

sd <- c(1.1536938, 2.1066705) # Standard deviations

# Parameters for Gamma distributions

shape_time <- 6.650757e-01

rate_time <- 4.239308e-06

shape_call <- 2.8803516

rate_call <- 0.8740178

# Simulate 'Hour' (Gaussian Mixture Model)

set.seed(123) # For reproducibility

component <- sample(1:2, n, replace = TRUE, prob = pi)

hours <- rnorm(n, mean = mu[component], sd = sd[component])

hours <- pmin(pmax(hours, 0), 23) # Ensure hours are within 0-23

# Simulate 'call_number' (Linear and Gamma distribution)

call_number <- round(rgamma(n, shape = shape_call, rate = rate_call) * (13 * hours))

# Simulate 'time_since_create' from a Gamma distribution

time_since_create_continuous <- rgamma(n, shape = shape_time, rate = rate_time)

# Discretize 'time_since_create' into bands

band_width <- 100000 # Set the band width to match the scale of your actual data

time_since_create_banded <- floor(time_since_create_continuous / band_width) * band_width

# Add sinusoidal variation within each band

time_since_create <- time_since_create_banded + (band_width * sin(pi * hours / 12))

# Simulate 'employee_id' (Random categorical assignment)

employee_ids <- sample(c('E1', 'E2', 'E3', 'E4'), n, replace=TRUE)

# Create DataFrame

data <- data.frame(

  Hour = hours,

  call_number = call_number,

  time_since_create = time_since_create,

  employee_id = employee_ids

)

head(data)

data$employee_id = as.factor(data$employee_id)
gavinsimpson commented 11 months ago

Thanks; so, just so I understand, does this simulated data set raise the error with the model specification in your original post?

sealavi commented 11 months ago

That said, (and although it doesn't matter form the point of view of something failing with an obscure error) I don't understand why you are using the s(employee_id, call_number, bs = "re") term to add a linear random effect of call_number, when you are also treating the effect of call_number as smooth? If you want a random smooth, use the fs basis. Ditto for the other linear random effect that also occurs in a smooth.

Thanks for the advice! I indeed want random smooths and didn't realize factor smooths are random smooths. Much appreciated!

sealavi commented 11 months ago

Thanks; so, just so I understand, does this simulated data set raise the error with the model specification in your original post?

Indeed it does, with bs = "re". With bs = "fs" I get an error

Error in seq.default(from = lower, to = upper, length.out = n) :

'length.out' must be a non-negative number

gavinsimpson commented 11 months ago

I can't reproduce this with the data provided and the GitHub version of {gratia} (which is only a trivial documentation fix different to the version you said you were using):

plot (8)

This is with:

─ Session info ──────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       Ubuntu 20.04.6 LTS
 system   x86_64, linux-gnu
 ui       X11
 language en_GB:en
 collate  en_GB.UTF-8
 ctype    en_GB.UTF-8
 tz       Europe/Copenhagen
 date     2023-12-28
 pandoc   2.5 @ /usr/bin/pandoc

─ Packages ──────────────────────────────────────────────────────────────────────────────────────
 !  package      * version  date (UTC) lib source
    brio           1.1.3    2021-11-30 [1] RSPM (R 4.3.0)
    cachem         1.0.8    2023-05-01 [1] RSPM (R 4.3.0)
    callr          3.7.3    2022-11-02 [1] RSPM (R 4.3.0)
    cli            3.6.2    2023-12-11 [1] RSPM (R 4.3.2)
    colorspace     2.1-0    2023-01-23 [1] RSPM (R 4.3.0)
    commonmark     1.9.0    2023-03-17 [1] RSPM (R 4.3.0)
    crayon         1.5.2    2022-09-29 [1] RSPM (R 4.3.0)
    curl           5.1.0    2023-10-02 [1] RSPM (R 4.3.2)
    desc           1.4.2    2022-09-08 [1] RSPM (R 4.3.0)
    devtools     * 2.4.5    2022-10-11 [1] RSPM (R 4.3.1)
    digest         0.6.33   2023-07-07 [1] RSPM (R 4.3.1)
    dplyr        * 1.1.4    2023-11-17 [1] CRAN (R 4.3.2)
    ellipsis       0.3.2    2021-04-29 [1] RSPM (R 4.3.0)
    fansi          1.0.5    2023-10-08 [1] CRAN (R 4.3.1)
    farver         2.1.1    2022-07-06 [1] RSPM (R 4.3.0)
    fastmap        1.1.1    2023-02-24 [1] RSPM (R 4.3.0)
    fs             1.6.3    2023-07-20 [1] RSPM (R 4.3.1)
    generics       0.1.3    2022-07-05 [1] RSPM (R 4.3.0)
    ggokabeito     0.1.0    2021-10-18 [1] RSPM (R 4.3.0)
    ggplot2        3.4.4    2023-10-12 [1] RSPM (R 4.3.1)
    glue           1.6.2    2022-02-24 [1] RSPM (R 4.3.0)
 VP gratia       * 0.8.1.47 2023-11-20 [?] https://gavinsimpson.r-universe.dev (R 4.3.2) (on disk 0.8.1.46)
    gtable         0.3.4    2023-08-21 [1] RSPM (R 4.3.1)
    htmltools      0.5.7    2023-11-03 [1] RSPM (R 4.3.2)
    htmlwidgets    1.6.2    2023-03-17 [1] RSPM (R 4.3.0)
    httpgd         1.3.1    2023-01-30 [1] RSPM (R 4.3.0)
    httpuv         1.6.12   2023-10-23 [1] RSPM (R 4.3.2)
    httr           1.4.7    2023-08-15 [1] RSPM (R 4.3.1)
    isoband        0.2.7    2022-12-20 [1] RSPM (R 4.3.0)
    jsonlite       1.8.8    2023-12-04 [1] RSPM (R 4.3.2)
    knitr          1.45     2023-10-30 [1] RSPM (R 4.3.2)
    labeling       0.4.3    2023-08-29 [1] RSPM (R 4.3.1)
    later          1.3.1    2023-05-02 [1] RSPM (R 4.3.0)
    lattice        0.22-5   2023-10-24 [1] RSPM (R 4.3.2)
    lifecycle      1.0.4    2023-11-07 [1] RSPM (R 4.3.2)
    magrittr       2.0.3    2022-03-30 [1] RSPM (R 4.3.0)
    Matrix         1.6-3    2023-11-14 [1] RSPM (R 4.3.2)
    memoise        2.0.1    2021-11-26 [1] RSPM (R 4.3.0)
    mgcv         * 1.9-0    2023-07-11 [1] RSPM (R 4.3.1)
    mime           0.12     2021-09-28 [1] RSPM (R 4.3.0)
    miniUI         0.1.1.1  2018-05-18 [1] RSPM (R 4.3.0)
    munsell        0.5.0    2018-06-12 [1] RSPM (R 4.3.0)
    mvnfast        0.2.8    2023-02-23 [1] RSPM (R 4.3.0)
    nlme         * 3.1-163  2023-08-09 [1] RSPM (R 4.3.1)
    patchwork      1.1.3    2023-08-14 [1] RSPM (R 4.3.1)
    pillar         1.9.0    2023-03-22 [1] RSPM (R 4.3.0)
    pkgbuild       1.4.2    2023-06-26 [1] RSPM (R 4.3.1)
    pkgconfig      2.0.3    2019-09-22 [1] RSPM (R 4.3.0)
    pkgload        1.3.3    2023-09-22 [1] CRAN (R 4.3.1)
    prettyunits    1.2.0    2023-09-24 [1] CRAN (R 4.3.1)
    processx       3.8.2    2023-06-30 [1] RSPM (R 4.3.1)
    profvis        0.3.8    2023-05-02 [1] RSPM (R 4.3.0)
    promises       1.2.1    2023-08-10 [1] RSPM (R 4.3.1)
    ps             1.7.5    2023-04-18 [1] RSPM (R 4.3.0)
    purrr          1.0.2    2023-08-10 [1] RSPM (R 4.3.1)
    R6             2.5.1    2021-08-19 [1] RSPM (R 4.3.0)
    RColorBrewer   1.1-3    2022-04-03 [1] RSPM (R 4.3.0)
    Rcpp           1.0.11   2023-07-06 [1] RSPM (R 4.3.1)
    remotes        2.4.2.1  2023-07-18 [1] RSPM (R 4.3.1)
    rlang          1.1.2    2023-11-04 [1] RSPM (R 4.3.2)
    roxygen2       7.2.3    2022-12-08 [1] RSPM (R 4.3.0)
    rprojroot      2.0.4    2023-11-05 [1] RSPM (R 4.3.2)
    rstudioapi     0.15.0   2023-07-07 [1] RSPM (R 4.3.1)
    scales         1.3.0    2023-11-28 [1] RSPM (R 4.3.2)
    sessioninfo    1.2.2    2021-12-06 [1] RSPM (R 4.3.0)
    shiny          1.8.0    2023-11-17 [1] CRAN (R 4.3.2)
    stringi        1.8.3    2023-12-11 [1] RSPM (R 4.3.2)
    stringr        1.5.1    2023-11-14 [1] RSPM (R 4.3.2)
    systemfonts    1.0.5    2023-10-09 [1] RSPM (R 4.3.2)
    testthat     * 3.2.0    2023-10-06 [1] RSPM (R 4.3.1)
    tibble         3.2.1    2023-03-20 [1] CRAN (R 4.3.0)
    tidyr          1.3.0    2023-01-24 [1] RSPM (R 4.3.0)
    tidyselect     1.2.0    2022-10-10 [1] RSPM (R 4.3.0)
    urlchecker     1.0.1    2021-11-30 [1] RSPM (R 4.3.0)
    usethis      * 2.2.2    2023-07-06 [1] RSPM (R 4.3.1)
    utf8           1.2.4    2023-10-22 [1] RSPM (R 4.3.1)
    vctrs          0.6.5    2023-12-01 [1] RSPM (R 4.3.2)
    withr          2.5.2    2023-10-30 [1] RSPM (R 4.3.1)
    xfun           0.41     2023-11-01 [1] RSPM (R 4.3.2)
    xml2           1.3.6    2023-12-04 [1] RSPM (R 4.3.2)
    xtable         1.8-4    2019-04-21 [1] RSPM (R 4.3.0)

 [1] /home/au690221/R/x86_64-pc-linux-gnu-library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

 V ── Loaded and on-disk version mismatch.
 P ── Loaded and on-disk path mismatch.

─────────────────────────────────────────────────────────────────────────────────────────────────

Can you provide the output from sessionInfo() or sessioninfo::session_info()?

sealavi commented 11 months ago

Here is the output from sessionInfo()

> sessionInfo()

R version 4.3.2 (2023-10-31 ucrt)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:

[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                         

[5] LC_TIME=English_United States.utf8   

time zone: America/Los_Angeles

tzcode source: internal

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:

[1] gratia_0.8.1.46    mgcv_1.9-0         nlme_3.1-164       ggplot2_3.4.4.9000 lubridate_1.9.3  

loaded via a namespace (and not attached):

[1] Matrix_1.6-4          mvnfast_0.2.8         gtable_0.3.4          dplyr_1.1.4           compiler_4.3.2        tidyselect_1.2.0      Rcpp_1.0.11         

 [8] stringr_1.5.1         parallel_4.3.2        tidyr_1.3.0           splines_4.3.2         scales_1.3.0          lattice_0.22-5        R6_2.5.1            

[15] generics_0.1.3        patchwork_1.1.3       tibble_3.2.1          munsell_0.5.0         pillar_1.9.0          rlang_1.1.2           utf8_1.2.4          

[22] stringi_1.8.3         timechange_0.2.0      cli_3.6.2             withr_2.5.2           magrittr_2.0.3        grid_4.3.2            rstudioapi_0.15.0   

[29] lifecycle_1.0.4       ggokabeito_0.1.0.9000 vctrs_0.6.5           glue_1.6.2            fansi_1.0.6           colorspace_2.1-0      purrr_1.0.2         

[36] tools_4.3.2           pkgconfig_2.0.3      
gavinsimpson commented 11 months ago

Nothing jumps out at me immediately, but next steps would be to update any out-dated packages, reinstall {gratia}, and run under R --vanilla, to see if the issue persists. I'm not seeing errors like this under any of my tests (and I do have tests for ti() terms, etc, which you said were failing too on your end), nor on CRAN's systems so I'm not sure how to proceed.

One thing I have noticed is that you are running a dev version of ggplot2, which I haven't (for obvious reasons) run this code against and the error is coming from ggplot (something to do with guides and panels). Switch to the CRAN version and the error will go away. I'll need to check this package with the dev version of ggplot2 at some point, and it looks like they are prepping for a release with breaking changes in the guides among other things, so I suspect that's where the error is coming in. I'll try to check with this and confirm dev version of ggplot2

sealavi commented 10 months ago

Thanks for your help. Switching to the CRAN version did solve the problem.

gavinsimpson commented 10 months ago

Thanks for letting me know; I'll need to figure this out in time for the next ggplot2 release.

gavinsimpson commented 8 months ago

Seems like these issues got dealt with during the ggplot2 release candidate process. I've been running the dev version of ggplot for a few weeks and am not seeing any issues with this new version nor any new failures on CRAN.

barryrowlingson commented 6 months ago

I'm getting this same error from simply library(gratia) and then example(draw.gam) (and many other attempts to draw things, this one is simple and reproducible) now:

> library(gratia)
> example(draw.gam)

drw.gm> load_mgcv()

drw.gm> # simulate some data
drw.gm> df1 <- data_sim("eg1", n = 400, dist = "normal", scale = 2, seed = 2)

drw.gm> # fit GAM
drw.gm> m1 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = df1, method = "REML")

drw.gm> # plot all smooths
drw.gm> draw(m1)
Error in Ops.data.frame(guide_loc, panel_loc) : 
  ‘==’ only defined for equally-sized data frames
Calls: example ... plot_table.ggplot -> add_guides -> unlist -> Ops.data.frame
Execution halted

Here's my session info:

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gratia_0.9.0

loaded via a namespace (and not attached):
 [1] patchwork_1.1.3  vctrs_0.6.3      nlme_3.1-162     cli_3.6.1       
 [5] rlang_1.1.1      stringi_1.7.12   purrr_1.0.2      generics_0.1.3  
 [9] glue_1.6.2       colorspace_2.1-0 scales_1.3.0     fansi_1.0.4     
[13] grid_4.3.0       munsell_0.5.0    tibble_3.2.1     ggokabeito_0.1.0
[17] lifecycle_1.0.3  stringr_1.5.0    compiler_4.3.0   mvnfast_0.2.8   
[21] dplyr_1.1.2      Rcpp_1.0.10      pkgconfig_2.0.3  tidyr_1.3.0     
[25] mgcv_1.9-1       lattice_0.21-8   R6_2.5.1         tidyselect_1.2.0
[29] utf8_1.2.3       pillar_1.9.0     splines_4.3.0    magrittr_2.0.3  
[33] Matrix_1.5-4.1   withr_2.5.0      tools_4.3.0      gtable_0.3.3    
[37] ggplot2_3.5.1   
> 
gavinsimpson commented 6 months ago

@barryrowlingson Thanks; while I don't know the exact cause of the problem, it's not something in gratia. It always seems to go away once all packages are updated. I note you are running under R 4.3.x but I also check the package on ubuntu-latest with R 4.3.x on GH Actions and it passes all checks including example checks.

When others have observed this issue it has gone away when they updated packages.

This is hard to track down further because I can't reproduce it locally or on any system that I or CRAN have used to check the package. The error is coming from deep within ggplot, so I don't know if something is out of sync in terms of the tidyverse packages and deps ggplot2 uses. As it seems to go away with a package update (at least previous reports, not all here, have), it doesn't seem to be an error in gratia, but as I can't reproduce it I can't figure out if I need to add a requirement for a specific minimum version of a package to avoid the issue.

barryrowlingson commented 6 months ago

Its related to the patchwork package - I can generate the error now without gratia:

> library(ggplot2)
Want to understand how all the pieces fit together? Read R for Data
Science: https://r4ds.hadley.nz/
> p1 <- ggplot(mtcars) + geom_point(aes(mpg, disp))
> p2 <- ggplot(mtcars) + geom_boxplot(aes(gear, disp, group = gear))
> library(patchwork)
> p1 + p2
Error in Ops.data.frame(guide_loc, panel_loc) : 
  ‘==’ only defined for equally-sized data frames

Upgrading patchwork to latest CRAN (v1.2.0) has fixed it.

Now I can try gratia! Thanks! (although I just finished going graphs in a quarto report using visreg :) )

gavinsimpson commented 5 months ago

Thanks @barryrowlingson I've added a requirement to gratia that it needs patchwork >= 1.2.0