Open DominiqueMakowski opened 5 years ago
I think the issue here is more difficult. The ICC that is mentioned in the paper from Koo & Li refers to Anovas, and it's calcualted in a very different way and can't be interpreted in the same way (although the "Anova" ICC and the mixed model ICC go into similar directions).
It's not optimal to have an index with identical names, though they refer to something different. For mixed models, an ICC above .15 or .2 is "high", while an ICC below .05 is "low" - although these are very rough rules of thumbs.
For mixed models, the ICC is not related to reliability. It just tells you how much of the explained variance of a model can be accounted to the random effects (i.e. the grouping structure). Ad far as I know, there are just a few software packages that can calculate the ICC for mixed models, and the approach from Nagakawa is only available in R (and via scripts in Stata, I think).
If the ICC is low, the outcome of a model has almost no variation in the defined groups (random effects), while the outcome has a large variation depending on the grouping factor when the ICC is high. For a low ICC, a simple model would return similar estimates as compared to a mixed model. For a high ICC, the estimates would differ more. But there's no general rule what ICC you need. In fact, you can justify a mixed model just by theoretical consideration, even if the ICC is low. You never lose anything when using a mixed model instead of a simple model. I personally use the ICC to see the variation of the outcome in different groups, and maybe investigate this variation further, if it's of interest. Here is a paper from me where we used the ICC to analyse variation of our outcome between countries.
I see. Within this context, would you recommend the reporting of the ICC for mixed models? If so, what would be a good generic, yet informative sentence?
I would report the ICC for mixed models. Maybe something like
"X% of the variance can explained by the model's grouping structure (random effects)."
If the ICC is below 5%, I would probably add:
"The ICC is rather low, indicating not much variation in the outcome between the groups."
And then the ICC value. However, CI for the ICC is a bit more difficult, we would need something like bootstrapping or so...
I guess, the time consuming part is performance::null_model()
, where the model is updated to calculate the NULL model. This may take up to some seconds, and bootstrapping multiplicates this by a large factor. Currently I have no good idea how to speed up this process...
"X% of the variance can be explained by the model's grouping structure (random effects)."
Although consistent with its statistical definition, this formulation provides information that could be seen as overlapping with the two R2s (marginal and conditional). Would you recommend reporting both the R2s and ICC? Or only one or the other? If both, how to emphasize the eventual difference between the two?
On a loosely related note, have you ever heard or seen the use of a ratio between the variance explained by the random effects and the total variance explained by the model, that would be used to understand the relative weight of random effects to the model (independently of its absolute explanatory power)? Would it make sense?
ratio between the variance explained by the random effects and the total variance explained by the model
That's actually the definition of the ICC for mixed models.
The ICC is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance).
The difference between ICC and R2 is that ICC relates to the variance of the random effects, while R2 relates to the "overall" variance of the models (i.e. including fixed effects variances). They're indeed very similar, but still relate to different parts of the model.
I would report both ICC and marginal and conditional R2. The wording would differ only slightly: the ICC refers to the grouping structure of the model (random effects), i.e. how much of the total model variance (R2) is explained by the random effects (ICC).
Aaaaa I missed that part! It is much clearer now. I got confused somehow by the meaning of total variance.
Anyway, I will try to add a basis for reports for frequentists mixed models, it will be easier to see how to add the ICC then.
I got confused somehow by the meaning of total variance.
Especially when reading the paper from Nagakawa et al. (2017), where he extended the compuation to models other than Gaussian. Here we have also the distributional variance etc., so it's indeed a bit confusing. And I'm no expert in reading formulas, as I'm no statistican but mostly self-taught. ;-)
[Moved from #15] Here's the current short and long report for lmer:
> report(circus::download_model("lmerMod_1"))
We fitted a linear mixed model to predict wt with cyl. The model included gear as random effects. The
model's total explanatory power is substantial (conditional R2 = 0.63) and the part related to the fixed
effects alone (marginal R2) is of 0.55. The model's intercept is at 0.65.
Within this model:
- cyl is significant (beta = 0.40, 95% CI [-0.39, 1.20], p < .001) and medium (std. beta = 0.74).
> report(circus::download_model("lmerMod_1")) %>% to_fulltext()
We fitted a linear mixed model to predict wt with cyl (formula = wt ~ cyl). The model included gear
as random effects (formula = ~1 | gear). Confidence intervals (CIs) and p values were computed using
Wald approximation. Effect sizes were labelled following Cohen's (1988) recommendations. The model's
total explanatory power is substantial (conditional R2 = 0.63) and the part related to the fixed effects
alone (marginal R2) is of 0.55. The model's intercept, corresponding to cyl = 0, is at 0.65 (t = 1.29, 95% CI
[-0.63, 1.93], p > .1).
Within this model:
- cyl is positive, significant (beta = 0.40, SE = 0.08, t = 5.29, 95% CI [-0.39, 1.20], p < .001) and
medium (std. beta = 0.74, std. SE = 0.14, std. 95% CI [, ]).
How could we phrase the ICC report?
"X% of the variance can explained by the model's grouping structure (random effects)."
It seems to me a bit close/overlapping with the R2s, but maybe that's because I am quite new to ICCs
Well, that actually fit's well into the report in that phrase, and would complement the missing part of "information on variances", I think.
Re-looking at it, I think there is still something I don't get. I have trouble clarifying the difference between the ICC interpreted this way and the R2...
Let's take this model for example:
> model_performance(circus::lmerMod_1)
AIC BIC R2_conditional R2_marginal ICC_adjusted ICC_conditional
1 71.59892 77.46187 0.6275509 0.5502792 0.1718216 0.07727173
From what I understand from this:
In that case, I would deduct that the random effects explain 62-55=7% of the variance of the outcome, which is itself 7/62=11% of the total variance explained by the model. However, the suggested values based on ICC would say "the proportion of the variance explained by the grouping structure" (the random?) is 17% (adjusted ICC), which appears as not coherent with the previous values. Moreover, as the ICC conditional is supposed to encompass both fixed and random effects, how can it be lower than the adjusted ICC?
We should look at the formulas closely. The major difference is that ICC has not the fixed effects variance in the denominator. From the "unexplained" variances, i.e. the variances not related to the fixed part, x% are explained by the random effects. R2 has the fixed effects variance in the denominator and refers to the total model variance.
Nakagawa only report the conditional ICC once in their paper. I'm not sure when this makes sense, but usually we want to know the adjusted ICC.
That's how I understand the difference.
But if the adjusted ICC can be inferred straightforwardly from the two R2s, isn't it a redundant index? And in that case, isn't the ratio between the explained variance by the random (ICC conditional = R2 cond - R2 marg = 0.077 in the above example) and the variance explained by the model (R2 cond = 0.62), in this case 12%, a more interesting relative index of the importance of the grouping structure to the model rather than to the outcome?
The adjusted ICC cannot be directly inferred from the r-squared values.
Originally, the ICC was only calculated for the NULL models, ie without fixed effects. And only for random intercept models. Thus there were only the residual and the random effects variation. ICC was the random effects variation divided by the sum of residual and random effects variation.
Later the ICC was extended to models with fixed effects and also with more complex random effects structures, that's why it's now more similar to an r-squared value, unlike the "original" ICC.
ICC and R2 are related, but still can't be derived from each other. The ICC relates to the random effects and to the between subject or between group variation. That's probably the leading distinction.
Quote from Nakagawa 2017
it is not surprising that the coefficient of determination R2 is a commonly reported statistic, because it represents the proportion of variance explained by a linear model. The intra-class correlation coefficient (ICC) is a related statistic that quantifies the proportion of variance explained by a grouping (random) factor in multilevel/hierarchical data.
@strengejacke I am truly sorry to annoy you with this particular index, but I am trying to get it right (and I reckon that other users might have the same questions), and reading Nakagawa over and over doesn't seem to help.
I understand that ICC and R2s are not computed in the same way and not statistically related. However, in all of the following models, the conditional ICC can be derived from the two R2s:
perf <- performance::model_performance(circus::lmerMod_1)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)
perf <- performance::model_performance(circus::merMod_1)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)
perf <- performance::model_performance(circus::merMod_2)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)
Is it because there are no random slopes in these models (only random intercepts?)
Anyway, about the ICC report, for the following model:
perf <- performance::model_performance(lme4::lmer(Sepal.Length ~ Sepal.Width + (1|Species), data=iris))
perf
AIC BIC R2_conditional R2_marginal ICC_adjusted ICC_conditional
1 202.6361 214.6787 0.8560323 0.09061311 0.841687 0.7654192
If I follow you, you would suggest a report along those lines:
"The total model explained a proportion of variance (R2 conditional) of .86. Within this model, the fixed factors alone (R2 marginal), and the grouping random factor (ICC adjusted), explained a proportion explained a proportion of variance of respectively .09 and .84"
is that correct?
Ok, let's try to summarize a bit. Maybe your example is a bit misleading, because you get the ICC results by adding / substracting R2-results. But I think this is just coincidence.
r2_marginal <- vars$var.fixef / (vars$var.fixef + vars$var.ranef + vars$var.resid)
r2_conditional <- (vars$var.fixef + vars$var.ranef) / (vars$var.fixef + vars$var.ranef + vars$var.resid)
icc_adjusted <- vars$var.ranef / (vars$var.ranef + vars$var.resid)
icc_conditional <- vars$var.ranef / (vars$var.fixef + vars$var.ranef + vars$var.resid)
The denominator for both R2 and conditional ICC are the same, but the nominator between R2 and ICC are different. That's why you usually should not be able to directly derive ICC values from R2 and vice versa.
Here's an example with a simple random intercept model and the "old" ICC-function, as well as the new approach from Nakagawa:
library(lme4)
data("sleepstudy")
m1 <- lme4::lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
sjstats::icc(m1)
#> ICC (Subject): 0.5893
sjstats::icc(m1, adjusted = TRUE)
#> Adjusted ICC: 0.5893
#> Conditional ICC: 0.4244
performance::icc(m1)
#> $ICC_adjusted
#> [1] 0.5893089
#>
#> $ICC_conditional
#> [1] 0.4243698
performance::r2_nakagawa(m1)
#> $R2_conditional
#> [1] 0.7042555
#>
#> $R2_marginal
#> [1] 0.2798856
As you can see, adjusted ICC and "normal" ICC (i.e. random effects variance divided by random effects + residual variance) are identical (that's to be expected).
I think the difficulty is that the ICC refers to the ratio of the between-group-variance and sum of between-group and within-group variance, and thus only accounts for the random parts of the model. The residual variance we are talking here about is not comparable to a "classical" residual variance from a simple linear model, because the residual variance in mixed models (also) refers to the random effects. This is why it is difficult to have a r-squared measure for mixed models. I think, Nakagawa came from ICC, added the fixed effects variances to the formula to approximate something that can be considered as "R2"...
For binomial models, the "classical" and Nakagawa ICC are also identical:
library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
performance::icc(gm1)
#> $ICC_adjusted
#> [1] 0.1113561
#>
#> $ICC_conditional
#> [1] 0.1018504
sjstats::icc(gm1)
#> ICC (herd): 0.1114
sjstats::icc(gm1, adjusted = TRUE)
#> Adjusted ICC: 0.1114
#> Conditional ICC: 0.1019
The adjusted ICC differs from the "classical" ICC for models with nested random effects and / or random slopes, and for models other than Gaussian or binomial, but this is just a sidenote.
@strengejacke Thanks to your work on ICC, we could start thinking about its reporting. I think this information has its place, when applicable, in the full reports (fulltext and fulltable). Below are some hints about how to actually report it.
Interpretation
From Koo & Li (2016):
From statisticshowto:
Reporting
From Koo & Li (2016):
Questions:
Implementation
An example of a more generic (applicable not only for reliability studies) possible report sentence: