easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1k stars 87 forks source link

Collinearity not available for models of class 'hurdle' #606

Closed MarcRieraDominguez closed 12 months ago

MarcRieraDominguez commented 1 year ago

Hi! The check_collinearity() function returns an error when applied to a model of class hurdle. The documentation of performance indicates that such models are supported. For now, a workaround could be to fit the hurdle model with glmmTMB, and get the variance inflation factors for that model. Congratulations on the great package!

library(glmmTMB)
library(performance)
#> Warning: package 'performance' was built under R version 4.2.3
library(pscl)
#> Warning: package 'pscl' was built under R version 4.2.3
#> Classes and Methods for R developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University
#> Simon Jackman
#> hurdle and zeroinfl functions by Achim Zeileis

data("bioChemists", package = "pscl")
summary(bioChemists)
#>       art            fem           mar           kid5             phd       
#>  Min.   : 0.000   Men  :494   Single :309   Min.   :0.0000   Min.   :0.755  
#>  1st Qu.: 0.000   Women:421   Married:606   1st Qu.:0.0000   1st Qu.:2.260  
#>  Median : 1.000                             Median :0.0000   Median :3.150  
#>  Mean   : 1.693                             Mean   :0.4951   Mean   :3.103  
#>  3rd Qu.: 2.000                             3rd Qu.:1.0000   3rd Qu.:3.920  
#>  Max.   :19.000                             Max.   :3.0000   Max.   :4.620  
#>       ment       
#>  Min.   : 0.000  
#>  1st Qu.: 3.000  
#>  Median : 6.000  
#>  Mean   : 8.767  
#>  3rd Qu.:12.000  
#>  Max.   :77.000

list.mod <- list(
  hurdle = pscl::hurdle(art ~ fem + mar, data = bioChemists, dist = "poisson", zero.dist = "binomial", link = "logit"),
  glmmtmb = glmmTMB::glmmTMB(art ~ fem + mar, data = bioChemists, family = truncated_poisson(), ziformula = ~ fem + mar)
  )

lapply(list.mod, class)
#> $hurdle
#> [1] "hurdle"
#> 
#> $glmmtmb
#> [1] "glmmTMB"

lapply(list.mod, summary)
#> $hurdle
#> 
#> Call:
#> pscl::hurdle(formula = art ~ fem + mar, data = bioChemists, dist = "poisson", 
#>     zero.dist = "binomial", link = "logit")
#> 
#> Pearson residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.1392 -1.0178 -0.3446  0.3905 10.2843 
#> 
#> Count model coefficients (truncated poisson with log link):
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.847598   0.064084  13.226  < 2e-16 ***
#> femWomen    -0.237351   0.064199  -3.697 0.000218 ***
#> marMarried   0.008846   0.066944   0.132 0.894867    
#> Zero hurdle model coefficients (binomial with logit link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.90794    0.15640   5.805 6.42e-09 ***
#> femWomen    -0.24195    0.14923  -1.621    0.105    
#> marMarried   0.07802    0.15636   0.499    0.618    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> Number of iterations in BFGS optimization: 9 
#> Log-likelihood: -1670 on 6 Df
#> 
#> $glmmtmb
#>  Family: truncated_poisson  ( log )
#> Formula:          art ~ fem + mar
#> Zero inflation:       ~fem + mar
#> Data: bioChemists
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   3352.2   3381.1  -1670.1   3340.2      909 
#> 
#> 
#> Conditional model:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.847585   0.064084  13.226  < 2e-16 ***
#> femWomen    -0.237339   0.064199  -3.697 0.000218 ***
#> marMarried   0.008862   0.066944   0.132 0.894684    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero-inflation model:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.90794    0.15640  -5.805 6.42e-09 ***
#> femWomen     0.24195    0.14923   1.621    0.105    
#> marMarried  -0.07802    0.15636  -0.499    0.618    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

performance::check_collinearity(list.mod$hurdle, component = "all")
#> Error in x$terms %||% attr(x, "terms") %||% stop("no terms component nor attribute"): no terms component nor attribute

performance::check_collinearity(list.mod$glmmtmb)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>  Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>   fem 1.06 [1.02, 1.20]         1.03      0.95     [0.83, 0.98]
#>   mar 1.06 [1.02, 1.20]         1.03      0.95     [0.83, 0.98]
#> 
#> * zero inflated component:
#> 
#> Low Correlation
#> 
#>  Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>   fem 1.07 [1.02, 1.20]         1.03      0.94     [0.83, 0.98]
#>   mar 1.07 [1.02, 1.20]         1.03      0.94     [0.83, 0.98]

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8   
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] pscl_1.5.5.1       performance_0.10.4 glmmTMB_1.1.3     
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.10           compiler_4.2.0        nloptr_2.0.1         
#>  [4] highr_0.9             TMB_1.8.1             tools_4.2.0          
#>  [7] boot_1.3-28           digest_0.6.29         lme4_1.1-29          
#> [10] evaluate_0.15         lifecycle_1.0.3       nlme_3.1-157         
#> [13] lattice_0.20-45       rlang_1.1.1           reprex_2.0.2         
#> [16] Matrix_1.4-1          cli_3.6.1             rstudioapi_0.13      
#> [19] yaml_2.3.5            mvtnorm_1.1-3         xfun_0.40            
#> [22] fastmap_1.1.0         coda_0.19-4           withr_2.5.0          
#> [25] stringr_1.5.0         knitr_1.39            fs_1.5.2             
#> [28] vctrs_0.6.3           grid_4.2.0            glue_1.6.2           
#> [31] survival_3.3-1        rmarkdown_2.14        multcomp_1.4-19      
#> [34] TH.data_1.1-1         minqa_1.2.4           magrittr_2.0.3       
#> [37] codetools_0.2-18      htmltools_0.5.6       emmeans_1.8.5-9000004
#> [40] splines_4.2.0         MASS_7.3-57           datawizard_0.7.0     
#> [43] insight_0.19.1.3      xtable_1.8-4          numDeriv_2016.8-1.1  
#> [46] sandwich_3.0-1        stringi_1.7.6         estimability_1.4.1   
#> [49] zoo_1.8-10

Created on 2023-09-07 with reprex v2.0.2

strengejacke commented 12 months ago

Thanks, should be fixed.

MarcRieraDominguez commented 12 months ago

Thanks to you for the awesome package!