Closed cenabcd closed 4 months ago
I posted an answer here: https://github.com/glmmTMB/glmmTMB/issues/169 because I'd like to hear advices from the glmmTMB-authors.
I'm not sure if there's a bug (I don't think so), but rather that due to the fact that the model might be too complex for the sparse data you get an R2 larger than 1. Because the distribution-specific variance is negative. For now, I added a warning so users are aware that the model may be mis-specified or there are too few data point.
Dear @strengejacke,
I thank you for this amazing package. However, I seem to have a similar problem to @cenabcd.
I am building a set of beta regression mixed models using glmmTMB::glmmTMB()
mostly following this tutorial. My problem is that when I try to compute pseudo-R2 using performance::r2_nakagawa()
, I get meaningless values (negative or higher than one) associated with these warnings:
"1: mu of 3.2 is too close to zero, estimate of random effect variances may be unreliable.
2: Model's distribution-specific variance is negative. Results are not reliable. "
I do not think my models are misspecified (at least not in the same way as @cenabcd) because my most complicated models only have 3 fixed terms for 85 observations and a single random effect factor with numerous levels. Could you help me solve this issue or explain me what I am doing wrong? Thanks in advance!
Note: I have no true interest in my random effect, it is simply a precaution against autocorrelation.
Here is a minimal working example:
library(RCurl)
library(dplyr)
library(glmmTMB)
library(performance)
x <- RCurl::getURL("https://raw.githubusercontent.com/mrelnoob/jk.dusz.tarping/master/mydata/erad.csv")
read.csv(text = x) %>%
mutate(manager_id = as.factor(manager_id),
pb_fixation = as.factor(pb_fixation),
fully_tarped = as.factor(fully_tarped),
uprootexcav = as.factor(uprootexcav)) %>% # There are other factors in the dataset, but it doesn't matter here
mutate(efficiency = efficiency/10) %>% # For my Y to be [0,1]
mutate(efficiency = ifelse(efficiency == 1, 0.999, efficiency)) -> eff # For my Y to be (0,1)
summary(eff)
# Some examples:
mod0 <- glmmTMB(efficiency~(1|manager_id), data = eff,
family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod0) # Both R2 are negative (but perhaps it doesn't make sense to compute R2 for a null model?)
mod1 <- glmmTMB(efficiency~fully_tarped + (1|manager_id), data = eff,
family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod1) # Both R2 are larger than one!
mod2 <- glmmTMB(efficiency~fully_tarped + obstacles + distance + (1|manager_id), data = eff,
family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod2) # Both R2 are larger than one!
And here are some session info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] performance_0.7.1 glmmTMB_1.0.2.1 dplyr_1.0.5 RCurl_1.98-1.3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.6.0 compiler_4.0.3 nloptr_1.2.2.2 TMB_1.7.20 bitops_1.0-6
[7] tools_4.0.3 boot_1.3-25 lme4_1.1-26 statmod_1.4.35 lifecycle_1.0.0 tibble_3.1.0
[13] nlme_3.1-149 lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.10 Matrix_1.2-18 DBI_1.1.0
[19] mvtnorm_1.1-1 coda_0.19-4 bayestestR_0.9.0 generics_0.1.0 vctrs_0.3.7 grid_4.0.3
[25] tidyselect_1.1.0 glue_1.4.2 R6_2.5.0 fansi_0.4.2 survival_3.2-7 multcomp_1.4-14
[31] TH.data_1.0-10 minqa_1.2.4 purrr_0.3.4 blob_1.2.1 magrittr_2.0.1 codetools_0.2-16
[37] ellipsis_0.3.1 emmeans_1.6.0 splines_4.0.3 MASS_7.3-53 insight_0.13.2 assertthat_0.2.1
[43] xtable_1.8-4 sandwich_3.0-0 utf8_1.2.1 estimability_1.3 crayon_1.4.1 zoo_1.8-8
I don't think this is an issue due to possible misspecification of your model. Rather, this might be that the implementation for beta/beta-binomial distributions might not be reliable yet. I'll look into it. It's most likely due to the wrong computation of the random effects variances, that's why I moved this issue to the insight-package (get_variance()
).
Should be fixed in #883
Sy1 <- c(0.13236514,0.22750129,0.17552144,0.03125000,0.21581064,0.31045606,0.32317058,
0.13786484,0.20363450,0.04667243,0.11466004,0.08553105,0.06910625,0.38306258,
0.96875000,0.39076146)
Sy2 <- c(0.8379146,0.8839276,0.7977936,0.9687500,0.7088845,0.9071632,0.7015753,0.8983716,
0.3673358,0.3851011,0.9039983,0.7838958,0.1089279,0.5363377,0.0312500,0.3579182)
Sx1 <- c(0.3721186,0.9342388,1.1827041,0.9062453,1.4028007,-0.2134339,-0.3026237, -0.4055091,
0.9619032,1.0028019,-1.2453095,-1.4471977,-1.2453095,-1.4471977,-0.4175607,-0.1386708)
Sx2 <-c(0.3467040,-0.6168100,-1.0331197,0.2627958,-1.0300119,0.3591348,1.2634787,
2.3511776,-0.9740731,-0.1505297,0.2255033,0.6885522,0.2255033,0.6201826,
-1.2195823,-1.2289054)
Sx3 <- c(0.895777191,0.865860951,-1.321776649,0.500739605,1.126822462,0.004963455,
1.287781382,1.361627154,-1.286865147,-0.182838527,-0.339620312,-1.474239471,
-0.339620312,-1.474239471,0.155638668,0.219989023)
site <- c(rep(1,5),rep(2,3),rep(3,4),rep(4,4))
df <- data.frame(site,Sy1,Sy2,Sx1,Sx2,Sx3)
library(glmmTMB)
library(performance)
glmm1 <- glmmTMB(Sy1 ~ Sx1 + Sx2 + Sx3 + (1|site) ,family=beta_family(),data=df)
r2(glmm1, tolerance = 1e-10)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.319
#> Marginal R2: 0.319
glmm2 <- glmmTMB(Sy2 ~ Sx1 + Sx2 + Sx3 + (1|site) ,family=beta_family(),data=df)
r2(glmm2, tolerance = 1e-10)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.835
#> Marginal R2: 0.192
Created on 2024-06-12 with reprex v2.1.0
@mrelnoob Your issues are fixed, too:
library(RCurl)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(glmmTMB)
library(performance)
x <- RCurl::getURL("https://raw.githubusercontent.com/mrelnoob/jk.dusz.tarping/master/mydata/erad.csv")
read.csv(text = x) %>%
mutate(manager_id = as.factor(manager_id),
pb_fixation = as.factor(pb_fixation),
fully_tarped = as.factor(fully_tarped),
uprootexcav = as.factor(uprootexcav)) %>% # There are other factors in the dataset, but it doesn't matter here
mutate(efficiency = efficiency/10) %>% # For my Y to be [0,1]
mutate(efficiency = ifelse(efficiency == 1, 0.999, efficiency)) -> eff # For my Y to be (0,1)
summary(eff)
#> manager_id xp_id efficiency eff_eradication
#> 1 :10 Length:85 Min. :0.1333 Min. :0.0000
#> 20 : 2 Class :character 1st Qu.:0.6000 1st Qu.:0.0000
#> 2 : 1 Mode :character Median :0.8000 Median :0.0000
#> 3 : 1 Mean :0.7354 Mean :0.2706
#> 4 : 1 3rd Qu.:0.9000 3rd Qu.:1.0000
#> 6 : 1 Max. :0.9990 Max. :1.0000
#> (Other):69
#> high_eff latitude longitude elevation
#> Min. :0.0000 Min. :42.45 Min. :-121.951 Min. : 3.0
#> 1st Qu.:0.0000 1st Qu.:45.52 1st Qu.: -1.076 1st Qu.: 67.0
#> Median :0.0000 Median :46.79 Median : 2.705 Median : 179.0
#> Mean :0.3529 Mean :46.99 Mean : -10.329 Mean : 227.5
#> 3rd Qu.:1.0000 3rd Qu.:48.46 3rd Qu.: 4.833 3rd Qu.: 271.0
#> Max. :1.0000 Max. :52.08 Max. : 14.083 Max. :1090.0
#>
#> goals followups slope coarse_env
#> Length:85 Min. :-2.182358 Min. : 0.000 Min. :-2.5132
#> Class :character 1st Qu.:-0.586979 1st Qu.: 1.000 1st Qu.:-1.0378
#> Mode :character Median :-0.116194 Median : 4.000 Median :-0.2464
#> Mean :-0.007942 Mean : 4.176 Mean :-0.0158
#> 3rd Qu.: 0.834632 3rd Qu.: 7.000 3rd Qu.: 0.9688
#> Max. : 2.908625 Max. :10.000 Max. : 3.2978
#>
#> obstacles flood geomem maxveg uprootexcav
#> Min. : 0.000 Min. : 0.000 Min. :0.0 Min. :0.0000 0:42
#> 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.0 1st Qu.:0.0000 1:43
#> Median : 2.000 Median : 2.000 Median :1.0 Median :0.0000
#> Mean : 3.157 Mean : 3.412 Mean :0.6 Mean :0.3412
#> 3rd Qu.: 6.000 3rd Qu.: 7.000 3rd Qu.:1.0 3rd Qu.:1.0000
#> Max. :10.000 Max. :10.000 Max. :1.0 Max. :1.0000
#>
#> stand_surface fully_tarped distance tarping_duration
#> Min. : 3.0 0:17 Min. :0.000 Min. : 2.000
#> 1st Qu.: 28.0 1:68 1st Qu.:0.500 1st Qu.: 3.000
#> Median : 60.0 Median :1.000 Median : 4.000
#> Mean : 236.6 Mean :1.254 Mean : 4.724
#> 3rd Qu.: 232.0 3rd Qu.:2.000 3rd Qu.: 5.000
#> Max. :2000.0 Max. :5.000 Max. :12.000
#>
#> stripsoverlap_ok tarpfix_multimethod tarpfix_pierced sedicover_height
#> Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 0.000
#> 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000
#> Median :1.0000 Median :1.0000 Median :0.0000 Median : 0.000
#> Mean :0.6118 Mean :0.7176 Mean :0.3059 Mean : 7.247
#> 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:10.000
#> Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :55.000
#>
#> trench_depth plantation repairs add_control
#> Min. : 0.00 Min. :0.0000 Min. :0.0000 Min. :0.0000
#> 1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
#> Median : 15.00 Median :0.0000 Median :0.0000 Median :1.0000
#> Mean : 23.45 Mean :0.1882 Mean :0.4118 Mean :0.7412
#> 3rd Qu.: 40.00 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
#> Max. :120.00 Max. :1.0000 Max. :1.0000 Max. :1.0000
#>
#> pb_fixation pb_durability
#> 0:67 Min. :0.0
#> 1:18 1st Qu.:0.0
#> Median :0.0
#> Mean :0.2
#> 3rd Qu.:0.0
#> Max. :1.0
#>
# Some examples:
mod0 <- glmmTMB(efficiency~(1|manager_id), data = eff,
family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod0) # Both R2 are negative (but perhaps it doesn't make sense to compute R2 for a null model?)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.279
#> Marginal R2: 0.000
mod1 <- glmmTMB(efficiency~fully_tarped + (1|manager_id), data = eff,
family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod1) # Both R2 are larger than one!
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.515
#> Marginal R2: 0.472
mod2 <- glmmTMB(efficiency~fully_tarped + obstacles + distance + (1|manager_id), data = eff,
family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod2) # Both R2 are larger than one!
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.744
#> Marginal R2: 0.531
Created on 2024-06-12 with reprex v2.1.0
Hello contributor and all,
when I use r2() to calcualte r2 with glmmTMB based on beta distribution , I cannot get the right answer. I will give my example here,
So what should I need to do? what do I need to pay attention to? Any hints would be greatly appreciated. Thank you already.