easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
965 stars 87 forks source link

Check_model in version 0.11.0 no longer produces qq plot residuals #708

Closed scrryl closed 3 months ago

scrryl commented 3 months ago

I just updated performance to the newest version and my code no longer works. Check_model produces 3 of the 4 plots and the qq plot is not one of them. Interestingly, I cannot call the plot directly using the check = "qq" flag.

Any thoughts?

bwiernik commented 3 months ago

Can you give a reproducible example?

scrryl commented 3 months ago

@bwiernik

# Set seed for reproducibility
set.seed(250419)

# Generate random x values
x <- rnorm(n = 500, 
           mean = 5, 
           sd = 2)

# Generate y values y = 5x + e
y <- 5*x + rnorm(n = 500,
                 mean = 5,
                 sd = 2)

# Generate z as offset
z <- runif(500, min = 0, max = 6719)

mock_data <- data.frame(x, y, z) |>
  dplyr::mutate(y = round(y), z = round(z)) |> # both should be whole numbers since they're counts
  dplyr::filter(!x < 0, !y < 0) 

# Run model
model1 <- stats::glm(y ~ x + offset(log(z)),family = "quasipoisson", data = mock_data)

performance::check_model(model1)
bwiernik commented 3 months ago

That code produces a qq plot for me. What are you seeing?

image

scrryl commented 3 months ago

@bwiernik

I see this:

image

bwiernik commented 3 months ago

Is anyone able to reproduce @strengejacke @IndrajeetPatil @mattansb @DominiqueMakowski @rempsyc

bwiernik commented 3 months ago

@scrryl Are you getting any errors or warnings? What happens if you make the plot window/pane larger?

scrryl commented 3 months ago

nope! no errors or warnings

when I make pane larger:

Screenshot 2024-04-02 at 8 05 22 PM
bwiernik commented 3 months ago
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Matrix_1.6-1.1         gtable_0.3.4           jsonlite_1.8.8         dplyr_1.1.4            compiler_4.3.2        
 [6] tidyselect_1.2.1       Rcpp_1.0.12            VGAM_1.1-9             see_0.8.3.5            textshaping_0.3.6     
[11] systemfonts_1.0.5      splines_4.3.2          scales_1.3.0           marginaleffects_0.14.0 readxl_1.4.3          
[16] lattice_0.21-9         ggplot2_3.5.0          R6_2.5.1               labeling_0.4.3         patchwork_1.2.0       
[21] generics_0.1.3         ggrepel_0.9.3          tibble_3.2.1           insight_0.19.10        munsell_0.5.0         
[26] shadowtext_0.1.2       pillar_1.9.0           rlang_1.1.3            easystats_0.7.1.1      utf8_1.2.4            
[31] performance_0.11.0.3   cli_3.6.2              mgcv_1.9-0             withr_3.0.0            magrittr_2.0.3        
[36] grid_4.3.2             rstudioapi_0.15.0      nlme_3.1-163           lifecycle_1.0.4        vctrs_0.6.5           
[41] glue_1.7.0             data.table_1.14.10     farver_2.1.1           cellranger_1.1.0       sessioninfo_1.2.2     
[46] ragg_1.2.5             stats4_4.3.2           fansi_1.0.6            colorspace_2.1-0       purrr_1.0.2           
[51] tools_4.3.2            pkgconfig_2.0.3       
scrryl commented 3 months ago
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 13.6.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gt_0.8.0           modelsummary_1.3.0 corrplot_0.92      Hmisc_4.7-0       
 [5] Formula_1.2-4      survival_3.2-13    lattice_0.20-45    scales_1.3.0      
 [9] lme4_1.1-29        Matrix_1.4-0       jtools_2.2.0       forcats_0.5.1     
[13] stringr_1.5.1      dplyr_1.1.2        purrr_1.0.1        readr_2.1.2       
[17] tidyr_1.3.0        tibble_3.2.1       ggplot2_3.4.4      tidyverse_1.3.2   

loaded via a namespace (and not attached):
  [1] readxl_1.4.0          backports_1.4.1       systemfonts_1.0.4     sp_1.4-7             
  [5] splines_4.1.3         crosstalk_1.2.0       listenv_0.8.0         leaflet_2.1.1        
  [9] digest_0.6.29         htmltools_0.5.2       fansi_1.0.3           DHARMa_0.4.6         
 [13] magrittr_2.0.3        checkmate_2.1.0       googlesheets4_1.0.0   cluster_2.1.2        
 [17] see_0.8.0             tzdb_0.3.0            globals_0.15.0        modelr_0.1.8         
 [21] svglite_2.1.0         jpeg_0.1-9            colorspace_2.1-0      ggrepel_0.9.1        
 [25] rvest_1.0.4           haven_2.5.0.9000      xfun_0.30             leafem_0.2.0         
 [29] crayon_1.5.1          jsonlite_1.8.0        glue_1.6.2            kableExtra_1.3.4.9000
 [33] gtable_0.3.0          gargle_1.2.0          webshot_0.5.5         car_3.1-0            
 [37] abind_1.4-5           DBI_1.1.2             rstatix_0.7.0         Rcpp_1.0.8.3         
 [41] performance_0.11.0    viridisLite_0.4.2     htmlTable_2.4.0       units_0.8-0          
 [45] foreign_0.8-82        proxy_0.4-26          stats4_4.1.3          datawizard_0.10.0    
 [49] htmlwidgets_1.5.4     httr_1.4.7            RColorBrewer_1.1-3    ellipsis_0.3.2       
 [53] pkgconfig_2.0.3       farver_2.1.1          nnet_7.3-17           dbplyr_2.2.0         
 [57] utf8_1.2.2            tidyselect_1.2.0      labeling_0.4.3        rlang_1.1.1          
 [61] munsell_0.5.0         cellranger_1.1.0      tools_4.1.3           cli_3.6.1            
 [65] generics_0.1.2        broom_0.8.0           evaluate_0.23         fastmap_1.1.1        
 [69] yaml_2.3.5            tables_0.9.10         knitr_1.39            fs_1.5.2             
 [73] pander_0.6.5          satellite_1.0.4       future_1.26.1         nlme_3.1-155         
 [77] xml2_1.3.3            compiler_4.1.3        rstudioapi_0.15.0     png_0.1-7            
 [81] e1071_1.7-9           ggsignif_0.6.4        reprex_2.0.1          stringi_1.7.6        
 [85] classInt_0.4-3        nloptr_2.0.1          ggsci_2.9             vctrs_0.6.2          
 [89] pillar_1.9.0          lifecycle_1.0.4       furrr_0.3.1           insight_0.19.10      
 [93] data.table_1.14.2     cowplot_1.1.1         raster_3.5-15         mapview_2.11.0       
 [97] patchwork_1.1.1       R6_2.5.1              latticeExtra_0.6-29   KernSmooth_2.23-20   
[101] gridExtra_2.3         parallelly_1.32.0     codetools_0.2-18      boot_1.3-28          
[105] MASS_7.3-55           gtools_3.9.2.2        assertthat_0.2.1      withr_2.5.0          
[109] broom.mixed_0.2.9.4   mgcv_1.8-39           parallel_4.1.3        hms_1.1.1            
[113] terra_1.5-21          grid_4.1.3            rpart_4.1.16          class_7.3-20         
[117] minqa_1.2.4           rmarkdown_2.14        carData_3.0-5         googledrive_2.0.0    
[121] ggpubr_0.4.0          sf_1.0-7              lubridate_1.8.0       base64enc_0.1-3
bwiernik commented 3 months ago

Just for clarity, can you run your code above from a fresh R session and post the session info? Please run the code from your post above (as I edited it so it would run).

mattansb commented 3 months ago

I'm also getting only 3 plots:

# Set seed for reproducibility
set.seed(250419)

# Generate random x values
x <- rnorm(n = 500, 
           mean = 5, 
           sd = 2)

# Generate y values y = 5x + e
y <- 5*x + rnorm(n = 500,
                 mean = 5,
                 sd = 2)

# Generate z as offset
z <- runif(500, min = 0, max = 6719)

mock_data <- data.frame(x, y, z) |>
  dplyr::mutate(y = round(y), z = round(z)) |> # both should be whole numbers since they're counts
  dplyr::filter(!x < 0, !y < 0) 

# Run model
model1 <- stats::glm(y ~ x + offset(log(z)),family = "quasipoisson", data = mock_data)

performance::check_model(model1)

Created on 2024-04-03 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31 ucrt) #> os Windows 11 x64 (build 22631) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Israel.utf8 #> ctype English_Israel.utf8 #> tz Asia/Jerusalem #> date 2024-04-03 #> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.3.2) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) #> curl 5.2.1 2024-03-01 [1] CRAN (R 4.3.3) #> datawizard 0.10.0 2024-03-26 [1] CRAN (R 4.3.3) #> DHARMa 0.4.6 2022-09-08 [1] CRAN (R 4.3.2) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) #> ggplot2 3.5.0 2024-02-23 [1] CRAN (R 4.3.1) #> ggrepel 0.9.5 2024-01-10 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) #> highr 0.10 2022-12-22 [1] CRAN (R 4.3.1) #> htmltools 0.5.8 2024-03-25 [1] CRAN (R 4.3.3) #> insight 0.19.10 2024-03-22 [1] CRAN (R 4.3.3) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) #> lattice 0.21-9 2023-10-01 [2] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> lme4 1.1-35.1 2023-11-05 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> MASS 7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.2) #> Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.3.2) #> mgcv 1.9-0 2023-07-11 [2] CRAN (R 4.3.2) #> minqa 1.2.6 2023-09-11 [1] CRAN (R 4.3.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1) #> nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2) #> nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.3.1) #> patchwork 1.2.0 2024-01-08 [1] CRAN (R 4.3.2) #> performance 0.11.0 2024-03-22 [1] CRAN (R 4.3.3) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1) #> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.2) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.3) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.3) #> scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2) #> see 0.8.2 2024-02-14 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1) #> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.3) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.43 2024-03-25 [1] CRAN (R 4.3.3) #> xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] C:/Users/user/AppData/Local/R/win-library/4.3 #> [2] C:/Program Files/R/R-4.3.2/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
mattansb commented 3 months ago

Could be related to this message? (But the plot still works)

(c_norm <- performance::check_normality(model1))
#> There's no formal statistical test for normality for generalized linear model.
#>   Instead, please use `simulate_residuals()` and `check_residuals()` to check for uniformity of residuals.

plot(c_norm)

image

bwiernik commented 3 months ago

Honestly that seems reasonable --- quasipoisson residuals shouldn't be normal. Do you know where the different behavior is coming from @strengejacke ?

strengejacke commented 3 months ago

quasipoisson is not supported by DHARMa, that's why it fails. You have to explicitly set residual_type = "normal", until we fixed this:

set.seed(250419)

# Generate random x values
x <- rnorm(n = 500, 
           mean = 5, 
           sd = 2)

# Generate y values y = 5x + e
y <- 5*x + rnorm(n = 500,
                 mean = 5,
                 sd = 2)

# Generate z as offset
z <- runif(500, min = 0, max = 6719)

mock_data <- data.frame(x, y, z) |>
  dplyr::mutate(y = round(y), z = round(z)) |> # both should be whole numbers since they're counts
  dplyr::filter(!x < 0, !y < 0) 

# Run model
model1 <- stats::glm(y ~ x + offset(log(z)),family = "quasipoisson", data = mock_data)

performance::check_model(model1, residual_type = "normal")

Created on 2024-04-03 with reprex v2.1.0

strengejacke commented 3 months ago

This is the error:

set.seed(250419)

# Generate random x values
x <- rnorm(
  n = 500,
  mean = 5,
  sd = 2
)

# Generate y values y = 5x + e
y <- 5 * x + rnorm(
  n = 500,
  mean = 5,
  sd = 2
)

# Generate z as offset
z <- runif(500, min = 0, max = 6719)

mock_data <- data.frame(x, y, z) |>
  # both should be whole numbers since they're counts
  datawizard::data_modify(y = round(y), z = round(z)) |>
  datawizard::data_filter(!x < 0, !y < 0)

# Run model
model1 <- glm(y ~ x + offset(log(z)), family = "quasipoisson", data = mock_data)
DHARMa::simulateResiduals(model1)
#> Error in simulate.lm(object, nsim = nsim, ...): family 'quasipoisson' not implemented

Created on 2024-04-03 with reprex v2.1.0

However, at least in simulateResiduals(), there's a check for that family:

    if (is.null(integerResponse)) {
        if (family$family %in% c("binomial", "poisson", "quasibinomial", 
            "quasipoisson", "Negative Binom", "nbinom2", "nbinom1", 
            "genpois", "compois", "truncated_poisson", "truncated_nbinom2", 
            "truncated_nbinom1", "betabinomial", "Poisson", "Tpoisson", 
            "COMPoisson", "negbin", "Tnegbin") | grepl("Negative Binomial", 
            family$family)) 
            integerResponse = TRUE
        else integerResponse = FALSE
    }

So the package stops at a later point. @florianhartig is it correct that quasi-families are not yet supported, or is this not intended? I could open an issue at the DHARMa repo.

strengejacke commented 3 months ago

Ah, I see. It's is stats::simulate() where the error comes from.

scrryl commented 3 months ago

quasipoisson is not supported by DHARMa, that's why it fails. You have to explicitly set residual_type = "normal", until we fixed this:

set.seed(250419)

# Generate random x values
x <- rnorm(n = 500, 
           mean = 5, 
           sd = 2)

# Generate y values y = 5x + e
y <- 5*x + rnorm(n = 500,
                 mean = 5,
                 sd = 2)

# Generate z as offset
z <- runif(500, min = 0, max = 6719)

mock_data <- data.frame(x, y, z) |>
  dplyr::mutate(y = round(y), z = round(z)) |> # both should be whole numbers since they're counts
  dplyr::filter(!x < 0, !y < 0) 

# Run model
model1 <- stats::glm(y ~ x + offset(log(z)),family = "quasipoisson", data = mock_data)

performance::check_model(model1, residual_type = "normal")

Created on 2024-04-03 with reprex v2.1.0

residual_type = "normal" was the fix. I'll keep a lookout for updates.

Thank you all so much!

bwiernik commented 3 months ago

Note that quasipoisson model residuals should not be normally distributed, so this plot isn't really meaningful

strengejacke commented 2 months ago

residual_type = "normal" was the fix. I'll keep a lookout for updates.

Make sure you have the latest easystats package installed:

install.packages("easystats")

then run:

easystats::install_latest()

and you should be fine.