performance:r2() calculates the R2 in glmmTMB with beta distribution

cenabcd commented 4 years ago

Hello contributor and all,

when I use r2() to calcualte r2 with glmmTMB based on beta distribution , I cannot get the right answer. I will give my example here,

Sy1 <- c(0.13236514,0.22750129,0.17552144,0.03125000,0.21581064,0.31045606,0.32317058,
          0.13786484,0.20363450,0.04667243,0.11466004,0.08553105,0.06910625,0.38306258,
          0.96875000,0.39076146)
Sy2 <- c(0.8379146,0.8839276,0.7977936,0.9687500,0.7088845,0.9071632,0.7015753,0.8983716,
         0.3673358,0.3851011,0.9039983,0.7838958,0.1089279,0.5363377,0.0312500,0.3579182)

Sx1 <- c(0.3721186,0.9342388,1.1827041,0.9062453,1.4028007,-0.2134339,-0.3026237, -0.4055091,
         0.9619032,1.0028019,-1.2453095,-1.4471977,-1.2453095,-1.4471977,-0.4175607,-0.1386708)

Sx2 <-c(0.3467040,-0.6168100,-1.0331197,0.2627958,-1.0300119,0.3591348,1.2634787,
         2.3511776,-0.9740731,-0.1505297,0.2255033,0.6885522,0.2255033,0.6201826,
        -1.2195823,-1.2289054)
Sx3 <- c(0.895777191,0.865860951,-1.321776649,0.500739605,1.126822462,0.004963455,
         1.287781382,1.361627154,-1.286865147,-0.182838527,-0.339620312,-1.474239471,
         -0.339620312,-1.474239471,0.155638668,0.219989023)
site <- c(rep(1,5),rep(2,3),rep(3,4),rep(4,4))
df <- data.frame(site,Sy1,Sy2,Sx1,Sx2,Sx3)

library(glmmTMB)
library(performance)
glmm1 <- glmmTMB(Sy1 ~ Sx1 + Sx2 + Sx3  + (1|site) ,family=beta_family(),data=df)
r2(glmm1)
#conditional R2: NA
#marginal R2:0.727

glmm2 <- glmmTMB(Sy2 ~ Sx1 + Sx2 + Sx3  + (1|site) ,family=beta_family(),data=df)
r2(glmm2)
#conditional R2: 1.019
#marginal R2:0.234

So what should I need to do? what do I need to pay attention to? Any hints would be greatly appreciated. Thank you already.

strengejacke commented 4 years ago

I posted an answer here: https://github.com/glmmTMB/glmmTMB/issues/169 because I'd like to hear advices from the glmmTMB-authors.

I'm not sure if there's a bug (I don't think so), but rather that due to the fact that the model might be too complex for the sparse data you get an R2 larger than 1. Because the distribution-specific variance is negative. For now, I added a warning so users are aware that the model may be mis-specified or there are too few data point.

mrelnoob commented 3 years ago

Dear @strengejacke,

I thank you for this amazing package. However, I seem to have a similar problem to @cenabcd. I am building a set of beta regression mixed models using glmmTMB::glmmTMB() mostly following this tutorial. My problem is that when I try to compute pseudo-R2 using performance::r2_nakagawa(), I get meaningless values (negative or higher than one) associated with these warnings: "1: mu of 3.2 is too close to zero, estimate of random effect variances may be unreliable. 2: Model's distribution-specific variance is negative. Results are not reliable. "

I do not think my models are misspecified (at least not in the same way as @cenabcd) because my most complicated models only have 3 fixed terms for 85 observations and a single random effect factor with numerous levels. Could you help me solve this issue or explain me what I am doing wrong? Thanks in advance!

Note: I have no true interest in my random effect, it is simply a precaution against autocorrelation.

Here is a minimal working example:

library(RCurl)
library(dplyr)
library(glmmTMB)
library(performance)
x <- RCurl::getURL("https://raw.githubusercontent.com/mrelnoob/jk.dusz.tarping/master/mydata/erad.csv")
read.csv(text = x) %>%
  mutate(manager_id = as.factor(manager_id),
         pb_fixation = as.factor(pb_fixation),
         fully_tarped = as.factor(fully_tarped),
         uprootexcav = as.factor(uprootexcav)) %>% # There are other factors in the dataset, but it doesn't matter here
  mutate(efficiency = efficiency/10) %>% # For my Y to be [0,1]
  mutate(efficiency = ifelse(efficiency == 1, 0.999, efficiency)) -> eff # For my Y to be (0,1)
summary(eff)

# Some examples:
mod0 <- glmmTMB(efficiency~(1|manager_id), data = eff,
                family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod0) # Both R2 are negative (but perhaps it doesn't make sense to compute R2 for a null model?)

mod1 <- glmmTMB(efficiency~fully_tarped + (1|manager_id), data = eff,
                                  family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod1) # Both R2 are larger than one!

mod2 <- glmmTMB(efficiency~fully_tarped + obstacles + distance + (1|manager_id), data = eff,
                family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod2) # Both R2 are larger than one!

And here are some session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] performance_0.7.1 glmmTMB_1.0.2.1   dplyr_1.0.5       RCurl_1.98-1.3   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6       pillar_1.6.0     compiler_4.0.3   nloptr_1.2.2.2   TMB_1.7.20       bitops_1.0-6    
 [7] tools_4.0.3      boot_1.3-25      lme4_1.1-26      statmod_1.4.35   lifecycle_1.0.0  tibble_3.1.0    
[13] nlme_3.1-149     lattice_0.20-41  pkgconfig_2.0.3  rlang_0.4.10     Matrix_1.2-18    DBI_1.1.0       
[19] mvtnorm_1.1-1    coda_0.19-4      bayestestR_0.9.0 generics_0.1.0   vctrs_0.3.7      grid_4.0.3      
[25] tidyselect_1.1.0 glue_1.4.2       R6_2.5.0         fansi_0.4.2      survival_3.2-7   multcomp_1.4-14 
[31] TH.data_1.0-10   minqa_1.2.4      purrr_0.3.4      blob_1.2.1       magrittr_2.0.1   codetools_0.2-16
[37] ellipsis_0.3.1   emmeans_1.6.0    splines_4.0.3    MASS_7.3-53      insight_0.13.2   assertthat_0.2.1
[43] xtable_1.8-4     sandwich_3.0-0   utf8_1.2.1       estimability_1.3 crayon_1.4.1     zoo_1.8-8

strengejacke commented 3 years ago

I don't think this is an issue due to possible misspecification of your model. Rather, this might be that the implementation for beta/beta-binomial distributions might not be reliable yet. I'll look into it. It's most likely due to the wrong computation of the random effects variances, that's why I moved this issue to the insight-package (get_variance()).

strengejacke commented 4 months ago

Should be fixed in #883

Sy1 <- c(0.13236514,0.22750129,0.17552144,0.03125000,0.21581064,0.31045606,0.32317058,
          0.13786484,0.20363450,0.04667243,0.11466004,0.08553105,0.06910625,0.38306258,
          0.96875000,0.39076146)
Sy2 <- c(0.8379146,0.8839276,0.7977936,0.9687500,0.7088845,0.9071632,0.7015753,0.8983716,
         0.3673358,0.3851011,0.9039983,0.7838958,0.1089279,0.5363377,0.0312500,0.3579182)

Sx1 <- c(0.3721186,0.9342388,1.1827041,0.9062453,1.4028007,-0.2134339,-0.3026237, -0.4055091,
         0.9619032,1.0028019,-1.2453095,-1.4471977,-1.2453095,-1.4471977,-0.4175607,-0.1386708)

Sx2 <-c(0.3467040,-0.6168100,-1.0331197,0.2627958,-1.0300119,0.3591348,1.2634787,
         2.3511776,-0.9740731,-0.1505297,0.2255033,0.6885522,0.2255033,0.6201826,
        -1.2195823,-1.2289054)
Sx3 <- c(0.895777191,0.865860951,-1.321776649,0.500739605,1.126822462,0.004963455,
         1.287781382,1.361627154,-1.286865147,-0.182838527,-0.339620312,-1.474239471,
         -0.339620312,-1.474239471,0.155638668,0.219989023)
site <- c(rep(1,5),rep(2,3),rep(3,4),rep(4,4))
df <- data.frame(site,Sy1,Sy2,Sx1,Sx2,Sx3)

library(glmmTMB)
library(performance)
glmm1 <- glmmTMB(Sy1 ~ Sx1 + Sx2 + Sx3  + (1|site) ,family=beta_family(),data=df)
r2(glmm1, tolerance = 1e-10)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.319
#>      Marginal R2: 0.319


glmm2 <- glmmTMB(Sy2 ~ Sx1 + Sx2 + Sx3  + (1|site) ,family=beta_family(),data=df)
r2(glmm2, tolerance = 1e-10)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.835
#>      Marginal R2: 0.192

^{Created on 2024-06-12 with reprex v2.1.0}

strengejacke commented 4 months ago

@mrelnoob Your issues are fixed, too:

library(RCurl)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

library(glmmTMB)
library(performance)
x <- RCurl::getURL("https://raw.githubusercontent.com/mrelnoob/jk.dusz.tarping/master/mydata/erad.csv")
read.csv(text = x) %>%
  mutate(manager_id = as.factor(manager_id),
         pb_fixation = as.factor(pb_fixation),
         fully_tarped = as.factor(fully_tarped),
         uprootexcav = as.factor(uprootexcav)) %>% # There are other factors in the dataset, but it doesn't matter here
  mutate(efficiency = efficiency/10) %>% # For my Y to be [0,1]
  mutate(efficiency = ifelse(efficiency == 1, 0.999, efficiency)) -> eff # For my Y to be (0,1)
summary(eff)
#>    manager_id    xp_id             efficiency     eff_eradication 
#>  1      :10   Length:85          Min.   :0.1333   Min.   :0.0000  
#>  20     : 2   Class :character   1st Qu.:0.6000   1st Qu.:0.0000  
#>  2      : 1   Mode  :character   Median :0.8000   Median :0.0000  
#>  3      : 1                      Mean   :0.7354   Mean   :0.2706  
#>  4      : 1                      3rd Qu.:0.9000   3rd Qu.:1.0000  
#>  6      : 1                      Max.   :0.9990   Max.   :1.0000  
#>  (Other):69                                                       
#>     high_eff         latitude       longitude          elevation     
#>  Min.   :0.0000   Min.   :42.45   Min.   :-121.951   Min.   :   3.0  
#>  1st Qu.:0.0000   1st Qu.:45.52   1st Qu.:  -1.076   1st Qu.:  67.0  
#>  Median :0.0000   Median :46.79   Median :   2.705   Median : 179.0  
#>  Mean   :0.3529   Mean   :46.99   Mean   : -10.329   Mean   : 227.5  
#>  3rd Qu.:1.0000   3rd Qu.:48.46   3rd Qu.:   4.833   3rd Qu.: 271.0  
#>  Max.   :1.0000   Max.   :52.08   Max.   :  14.083   Max.   :1090.0  
#>                                                                      
#>     goals             followups             slope          coarse_env     
#>  Length:85          Min.   :-2.182358   Min.   : 0.000   Min.   :-2.5132  
#>  Class :character   1st Qu.:-0.586979   1st Qu.: 1.000   1st Qu.:-1.0378  
#>  Mode  :character   Median :-0.116194   Median : 4.000   Median :-0.2464  
#>                     Mean   :-0.007942   Mean   : 4.176   Mean   :-0.0158  
#>                     3rd Qu.: 0.834632   3rd Qu.: 7.000   3rd Qu.: 0.9688  
#>                     Max.   : 2.908625   Max.   :10.000   Max.   : 3.2978  
#>                                                                           
#>    obstacles          flood            geomem        maxveg       uprootexcav
#>  Min.   : 0.000   Min.   : 0.000   Min.   :0.0   Min.   :0.0000   0:42       
#>  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:0.0   1st Qu.:0.0000   1:43       
#>  Median : 2.000   Median : 2.000   Median :1.0   Median :0.0000              
#>  Mean   : 3.157   Mean   : 3.412   Mean   :0.6   Mean   :0.3412              
#>  3rd Qu.: 6.000   3rd Qu.: 7.000   3rd Qu.:1.0   3rd Qu.:1.0000              
#>  Max.   :10.000   Max.   :10.000   Max.   :1.0   Max.   :1.0000              
#>                                                                              
#>  stand_surface    fully_tarped    distance     tarping_duration
#>  Min.   :   3.0   0:17         Min.   :0.000   Min.   : 2.000  
#>  1st Qu.:  28.0   1:68         1st Qu.:0.500   1st Qu.: 3.000  
#>  Median :  60.0                Median :1.000   Median : 4.000  
#>  Mean   : 236.6                Mean   :1.254   Mean   : 4.724  
#>  3rd Qu.: 232.0                3rd Qu.:2.000   3rd Qu.: 5.000  
#>  Max.   :2000.0                Max.   :5.000   Max.   :12.000  
#>                                                                
#>  stripsoverlap_ok tarpfix_multimethod tarpfix_pierced  sedicover_height
#>  Min.   :0.0000   Min.   :0.0000      Min.   :0.0000   Min.   : 0.000  
#>  1st Qu.:0.0000   1st Qu.:0.0000      1st Qu.:0.0000   1st Qu.: 0.000  
#>  Median :1.0000   Median :1.0000      Median :0.0000   Median : 0.000  
#>  Mean   :0.6118   Mean   :0.7176      Mean   :0.3059   Mean   : 7.247  
#>  3rd Qu.:1.0000   3rd Qu.:1.0000      3rd Qu.:1.0000   3rd Qu.:10.000  
#>  Max.   :1.0000   Max.   :1.0000      Max.   :1.0000   Max.   :55.000  
#>                                                                        
#>   trench_depth      plantation        repairs        add_control    
#>  Min.   :  0.00   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
#>  1st Qu.:  0.00   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
#>  Median : 15.00   Median :0.0000   Median :0.0000   Median :1.0000  
#>  Mean   : 23.45   Mean   :0.1882   Mean   :0.4118   Mean   :0.7412  
#>  3rd Qu.: 40.00   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
#>  Max.   :120.00   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
#>                                                                     
#>  pb_fixation pb_durability
#>  0:67        Min.   :0.0  
#>  1:18        1st Qu.:0.0  
#>              Median :0.0  
#>              Mean   :0.2  
#>              3rd Qu.:0.0  
#>              Max.   :1.0  
#>


# Some examples:
mod0 <- glmmTMB(efficiency~(1|manager_id), data = eff,
                family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod0) # Both R2 are negative (but perhaps it doesn't make sense to compute R2 for a null model?)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.279
#>      Marginal R2: 0.000


mod1 <- glmmTMB(efficiency~fully_tarped + (1|manager_id), data = eff,
                                  family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod1) # Both R2 are larger than one!
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.515
#>      Marginal R2: 0.472


mod2 <- glmmTMB(efficiency~fully_tarped + obstacles + distance + (1|manager_id), data = eff,
                family = beta_family(link = "logit"), REML = FALSE)
r2_nakagawa(model = mod2) # Both R2 are larger than one!
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.744
#>      Marginal R2: 0.531

^{Created on 2024-06-12 with reprex v2.1.0}

easystats / insight

performance:r2() calculates the R2 in glmmTMB with beta distribution #349