easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1.03k stars 94 forks source link

check_model() and other functions not working with only post-"hurdle" glmmTMB model #781

Open MaBKo opened 3 days ago

MaBKo commented 3 days ago

(First time opening an issue on github, please bear with me) Foreword: I'm a trained entomologist, with statistical and R-knowledge gathered through "learning by doing", so if this is just user error, I want to apologise in advance.

Stumbled upon this error, using a glmmTMB model with only the truncated "post hurdle" part (lacking a better descriptor): Error: check_model() returned following error: 'data' must be of a vector type, was 'NULL'

As far as I understand it, the usage of the truncated_nbinom2 family argument leads to this error, if the zi-component is deactivated.

Creating the model which produces the error:

library(glmmTMB)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.4.2
#> # Attaching packages: easystats 0.7.3.2
#> ✔ bayestestR  0.15.0.2    ✔ correlation 0.8.6    
#> ✔ datawizard  0.13.0.13   ✔ effectsize  0.8.9    
#> ✔ insight     1.0.0       ✔ modelbased  0.8.9    
#> ✔ performance 0.12.4.8    ✔ parameters  0.23.0.11
#> ✔ report      0.5.9       ✔ see         0.9.0.9
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_trunc_error<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                         data=Salamanders%>%filter(count>0), family="truncated_nbinom2", 
                         ziformula = ~0, dispformula = ~1)

check_collinearity(mod_trunc_error)
#> Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'
check_model(mod_trunc_error, panel=F)
#> Error: `check_model()` returned following error: 'data' must be of a vector
#>   type, was 'NULL'
#>   
#> If the error message does not help identifying your problem, another
#>   reason why `check_model()` failed might be that models of class
#>   `glmmTMB` are not yet supported.

Created on 2024-11-26 with reprex v2.1.1

Creating a whole hurdle model working fine (plotting deactivated; works as intended):

library(glmmTMB)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.4.2
#> # Attaching packages: easystats 0.7.3.2
#> ✔ bayestestR  0.15.0.2    ✔ correlation 0.8.6    
#> ✔ datawizard  0.13.0.13   ✔ effectsize  0.8.9    
#> ✔ insight     1.0.0       ✔ modelbased  0.8.9    
#> ✔ performance 0.12.4.8    ✔ parameters  0.23.0.11
#> ✔ report      0.5.9       ✔ see         0.9.0.9
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders, family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

check_collinearity(mod_whole)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.03 [1.00, 1.37]         1.02      0.97     [0.73, 1.00]
#>  mined 1.03 [1.00, 1.37]         1.02      0.97     [0.73, 1.00]
check_model(mod_whole, panel=F)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

Created on 2024-11-26 with reprex v2.1.1

Creating a post-hurdle model but with "normal" negbin2-family instead of the truncated version working fine (plotting deactivated; works as intended):

library(glmmTMB)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.4.2
#> # Attaching packages: easystats 0.7.3.2
#> ✔ bayestestR  0.15.0.2    ✔ correlation 0.8.6    
#> ✔ datawizard  0.13.0.13   ✔ effectsize  0.8.9    
#> ✔ insight     1.0.0       ✔ modelbased  0.8.9    
#> ✔ performance 0.12.4.8    ✔ parameters  0.23.0.11
#> ✔ report      0.5.9       ✔ see         0.9.0.9
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_trunc_working<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                           data=Salamanders%>%filter(count>0), family="nbinom2", 
                           ziformula = ~0, dispformula = ~1)

check_collinearity(mod_trunc_working)
#> # Check for Multicollinearity
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.02 [1.00, 4.45]         1.01      0.98     [0.22, 1.00]
#>  mined 1.02 [1.00, 4.45]         1.01      0.98     [0.22, 1.00]
check_model(mod_trunc_working, panel=F)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

Created on 2024-11-26 with reprex v2.1.1

Session info

Created on 2024-11-26 with reprex v2.1.1

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.0 (2024-04-24 ucrt) #> os Windows 10 x64 (build 19045) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United Kingdom.utf8 #> ctype English_United Kingdom.utf8 #> tz Europe/Berlin #> date 2024-11-26 #> pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.4.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.4.0) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0) #> glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.2) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) #> knitr 1.46 2024-04-06 [1] CRAN (R 4.4.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) #> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.2) #> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.2) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.4.0) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) #> withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.2) #> xfun 0.43 2024-03-25 [1] CRAN (R 4.4.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.3) #> #> #> ────────────────────────────────────────────────────────────────────────────── ```
strengejacke commented 3 days ago

Thanks for reporting, and the reproducible example! Will look into this.

strengejacke commented 3 days ago

Seems to work now, but not sure if the output is sensible?

data(Salamanders, package = "glmmTMB")
mod_trunc_error <- glmmTMB::glmmTMB(
  count ~ spp + mined + (1 | site),
  data = Salamanders[Salamanders$count > 0, , drop = FALSE],
  family = glmmTMB::truncated_nbinom2(),
  ziformula = ~ 0,
  dispformula = ~ 1
)
performance::check_collinearity(mod_trunc_error)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.03 [1.00, 2.32]         1.02      0.97     [0.43, 1.00]
#>  mined 1.03 [1.00, 2.32]         1.02      0.97     [0.43, 1.00]

Created on 2024-11-26 with reprex v2.1.1

MaBKo commented 3 days ago

Important:

In my initial post, I forgot to filter the dataset for zeros in the mod_whole part:

It should be

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders%>%filter(count>0), family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

Instead of

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders, family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

Whats the best practice here? Leaving it in this comment or editing the original post?

MaBKo commented 3 days ago

To @strengejacke's solution:

As stated before I'm not a person with solid statistical knowledge, but I guess that did the trick.

My understanding is that the vif results of the post-hurdle only model should be the same as the conditional component of the whole hurdle model, which seems to be the case, comparing the output of the corrected mod_whole with the ouput from @strengejacke's model

library(glmmTMB)
library(easystats)
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders%>%filter(count>0), family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

check_collinearity(mod_whole)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.03 [1.00, 2.30]         1.02      0.97     [0.43, 1.00]
#>  mined 1.03 [1.00, 2.30]         1.02      0.97     [0.43, 1.00]
check_model(mod_whole, panel=F)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

Created on 2024-11-27 with reprex v2.1.1

But maybe someone with a deeper understanding of the glmmTMB package or general statistical knowledge could weigh in.

P.S: I also want to say a big thank you to all contributors of this awesome package; the package and its documentation helped me a lot.

strengejacke commented 3 days ago

Whats the best practice here? Leaving it in this comment or editing the original post?

Doesn't matter, I have a reproducible example and added a test, so no need to edit the posts.

My understanding is that the vif results of the post-hurdle only model should be the same as the conditional component of the whole hurdle model, which seems to be the case, comparing the output of the corrected mod_whole with the ouput from @strengejacke's model

Great! I'll merge the PR once all tests pass.