easystats / performance

:muscle: Models' quality and performance metrics (R2, ICC, LOO, AIC, BF, ...)
https://easystats.github.io/performance/
GNU General Public License v3.0
1k stars 87 forks source link

performance::r2_nakagawa() and r.squaredGLMM() give different values for Gaussian glmmTMB models without random effects #652

Closed julian-wittische closed 9 months ago

julian-wittische commented 9 months ago

Hello everyone,

I love your package. I am in the process of reporting model fit information for several Gaussian models without random effects and by double checking r2 values with different functions I noticed large discrepancies between values from r.squaredGLMM() and _performance::r2nakagawa() (see below). Is there a known theoretical reason behind this (different calculations), or is one of them erroneous/inappropriate for our purpose ?

Capture

I found the same issue using another dataset. Here is a reproducible example:

library(performance)
library(glmmTMB)
data(Owls)

m_RAND <- glmmTMB(NegPerChick ~ BroodSize + ArrivalTime + (1|Nest), data=Owls)
m_NORAND <- glmmTMB(NegPerChick ~ BroodSize + ArrivalTime, data=Owls)

r2(m_RAND)
r.squaredGLMM(m_RAND)

r2(m_NORAND)
r.squaredGLMM(m_NORAND)

Capture2

PS: we are writing a scientific publication about mustelid parasites and we want to do the right thing

strengejacke commented 9 months ago

r2_nakagawa() is indeed not useful when you don't have random effects. See following example, which also compares to ordinary lm().

library(performance)
library(glmmTMB)
library(MuMIn)
data(Owls)

m_NORAND <- glmmTMB(NegPerChick ~ BroodSize + ArrivalTime, data = Owls)

r2(m_NORAND)
#> Random effect variances not available. Returned R2 does not account for random effects.
#> # R2 for Mixed Models
#> 
#>   Conditional R2: NA
#>      Marginal R2: 0.127
r.squaredGLMM(m_NORAND)
#> Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
#> Warning in r.squaredGLMM.glmmTMB(m_NORAND): the effects of zero-inflation and
#> dispersion model are ignored
#>             R2m        R2c
#> [1,] 0.05606123 0.05606123

m_easy <- lm(NegPerChick ~ BroodSize + ArrivalTime, data = Owls)
r.squaredGLMM(m_easy)
#>             R2m        R2c
#> [1,] 0.05579612 0.05579612
r2(m_easy)
#> # R2 for Linear Regression
#>        R2: 0.056
#>   adj. R2: 0.053

Created on 2023-11-21 with reprex v2.0.2

Conclusion: For non-mixed models, don't use r2_nakagawa(). I think we can, however, automatically fall back to manual r2-calculations, so that r2_nakagawa() also works for your example:

r <- residuals(m_NORAND)
f <- fitted(m_NORAND)
rss <- sum(r^2)
mss <- sum((f - mean(f))^2)
mss / (mss + rss)
#> [1] 0.05597288
bbolker commented 9 months ago

@julian-wittische , I'd highly recommend that in future you post code/output examples as text (in a code block, which you can delimit with triple-backticks -- you can also click the "<>" icon in the graphical options in the compose window) rather than as an image. It makes life easier for readers in many ways. (Good question though.)

strengejacke commented 9 months ago

This is what is (quickly) implemented in PR #653 for now:

library(performance)
library(glmmTMB)
library(MuMIn)
data(Owls)

m_NORAND <- glmmTMB(NegPerChick ~ BroodSize + ArrivalTime, data = Owls)
r2(m_NORAND)
#> # R2 for Linear Regression
#>   R2: 0.056
r.squaredGLMM(m_NORAND)
#> Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
#> Warning in r.squaredGLMM.glmmTMB(m_NORAND): the effects of zero-inflation and
#> dispersion model are ignored
#>             R2m        R2c
#> [1,] 0.05606123 0.05606123

Needs some testing, though, and I have to look which type of residuals are the most appropriate here.

julian-wittische commented 9 months ago

Thank you very much for the clarification and the fix.