easystats / report

:scroll: :tada: Automated reporting of objects in R
https://easystats.github.io/report/
Other
694 stars 70 forks source link

ICC report #14

Open DominiqueMakowski opened 5 years ago

DominiqueMakowski commented 5 years ago

@strengejacke Thanks to your work on ICC, we could start thinking about its reporting. I think this information has its place, when applicable, in the full reports (fulltext and fulltable). Below are some hints about how to actually report it.

Interpretation

From Koo & Li (2016):

Values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability.

From statisticshowto:

  • A high Intraclass Correlation Coefficient (ICC) close to 1 indicates high similarity between values from the same group.
  • A low ICC close to zero means that values from the same group are not similar.

Reporting

From Koo & Li (2016):

There is currently a lack of standard for reporting ICC in the clinical research community. Given that different forms of ICC involve distinct assumptions in their calculation and will lead to different interpretations, it is imperative for researchers to report detailed information about their ICC estimates. We suggest that the best practice of reporting ICC should include the following items: software information, “Model,” “Type,” and “Definition” selections. In addition, both ICC estimates and their 95% confidence intervals should be reported. For instance, the ICC information could be reported as such:

"ICC estimates and their 95% confident intervals were calculated using SPSS statistical package version 23 (SPSS Inc, Chicago, IL) based on a mean-rating (k = 3), absolute-agreement, 2-way mixed-effects model."

Questions:

Implementation

An example of a more generic (applicable not only for reliability studies) possible report sentence:

strengejacke commented 5 years ago

I think the issue here is more difficult. The ICC that is mentioned in the paper from Koo & Li refers to Anovas, and it's calcualted in a very different way and can't be interpreted in the same way (although the "Anova" ICC and the mixed model ICC go into similar directions).

It's not optimal to have an index with identical names, though they refer to something different. For mixed models, an ICC above .15 or .2 is "high", while an ICC below .05 is "low" - although these are very rough rules of thumbs.

For mixed models, the ICC is not related to reliability. It just tells you how much of the explained variance of a model can be accounted to the random effects (i.e. the grouping structure). Ad far as I know, there are just a few software packages that can calculate the ICC for mixed models, and the approach from Nagakawa is only available in R (and via scripts in Stata, I think).

If the ICC is low, the outcome of a model has almost no variation in the defined groups (random effects), while the outcome has a large variation depending on the grouping factor when the ICC is high. For a low ICC, a simple model would return similar estimates as compared to a mixed model. For a high ICC, the estimates would differ more. But there's no general rule what ICC you need. In fact, you can justify a mixed model just by theoretical consideration, even if the ICC is low. You never lose anything when using a mixed model instead of a simple model. I personally use the ICC to see the variation of the outcome in different groups, and maybe investigate this variation further, if it's of interest. Here is a paper from me where we used the ICC to analyse variation of our outcome between countries.

DominiqueMakowski commented 5 years ago

I see. Within this context, would you recommend the reporting of the ICC for mixed models? If so, what would be a good generic, yet informative sentence?

strengejacke commented 5 years ago

I would report the ICC for mixed models. Maybe something like

"X% of the variance can explained by the model's grouping structure (random effects)."

If the ICC is below 5%, I would probably add:

"The ICC is rather low, indicating not much variation in the outcome between the groups."

And then the ICC value. However, CI for the ICC is a bit more difficult, we would need something like bootstrapping or so...

strengejacke commented 5 years ago

I guess, the time consuming part is performance::null_model(), where the model is updated to calculate the NULL model. This may take up to some seconds, and bootstrapping multiplicates this by a large factor. Currently I have no good idea how to speed up this process...

DominiqueMakowski commented 5 years ago

"X% of the variance can be explained by the model's grouping structure (random effects)."

Although consistent with its statistical definition, this formulation provides information that could be seen as overlapping with the two R2s (marginal and conditional). Would you recommend reporting both the R2s and ICC? Or only one or the other? If both, how to emphasize the eventual difference between the two?

On a loosely related note, have you ever heard or seen the use of a ratio between the variance explained by the random effects and the total variance explained by the model, that would be used to understand the relative weight of random effects to the model (independently of its absolute explanatory power)? Would it make sense?

strengejacke commented 5 years ago

ratio between the variance explained by the random effects and the total variance explained by the model

That's actually the definition of the ICC for mixed models.

The ICC is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance).

The difference between ICC and R2 is that ICC relates to the variance of the random effects, while R2 relates to the "overall" variance of the models (i.e. including fixed effects variances). They're indeed very similar, but still relate to different parts of the model.

I would report both ICC and marginal and conditional R2. The wording would differ only slightly: the ICC refers to the grouping structure of the model (random effects), i.e. how much of the total model variance (R2) is explained by the random effects (ICC).

DominiqueMakowski commented 5 years ago

Aaaaa I missed that part! It is much clearer now. I got confused somehow by the meaning of total variance.

Anyway, I will try to add a basis for reports for frequentists mixed models, it will be easier to see how to add the ICC then.

strengejacke commented 5 years ago

I got confused somehow by the meaning of total variance.

Especially when reading the paper from Nagakawa et al. (2017), where he extended the compuation to models other than Gaussian. Here we have also the distributional variance etc., so it's indeed a bit confusing. And I'm no expert in reading formulas, as I'm no statistican but mostly self-taught. ;-)

DominiqueMakowski commented 5 years ago

[Moved from #15] Here's the current short and long report for lmer:

> report(circus::download_model("lmerMod_1"))

We fitted a linear mixed model to predict wt with cyl. The model included gear as random effects. The 
model's total explanatory power is substantial (conditional R2 = 0.63) and the part related to the fixed 
effects alone (marginal R2) is of 0.55. The model's intercept is at 0.65. 

Within this model: 
  - cyl is significant (beta = 0.40, 95% CI [-0.39, 1.20], p < .001) and medium (std. beta = 0.74).

> report(circus::download_model("lmerMod_1")) %>% to_fulltext()

We fitted a linear mixed model to predict wt with cyl (formula = wt ~ cyl). The model included gear 
as random effects (formula = ~1 | gear). Confidence intervals (CIs) and p values were computed using 
Wald approximation. Effect sizes were labelled following Cohen's (1988) recommendations. The model's 
total explanatory power is substantial (conditional R2 = 0.63) and the part related to the fixed effects 
alone (marginal R2) is of 0.55. The model's intercept, corresponding to cyl = 0, is at 0.65 (t = 1.29, 95% CI 
[-0.63, 1.93], p > .1). 

Within this model: 
  - cyl is positive, significant (beta = 0.40, SE = 0.08, t = 5.29, 95% CI [-0.39, 1.20], p < .001) and 
medium (std. beta = 0.74, std. SE = 0.14, std. 95% CI [, ]).

How could we phrase the ICC report?

"X% of the variance can explained by the model's grouping structure (random effects)."

It seems to me a bit close/overlapping with the R2s, but maybe that's because I am quite new to ICCs

strengejacke commented 5 years ago

Well, that actually fit's well into the report in that phrase, and would complement the missing part of "information on variances", I think.

DominiqueMakowski commented 5 years ago

Re-looking at it, I think there is still something I don't get. I have trouble clarifying the difference between the ICC interpreted this way and the R2...

Let's take this model for example:

> model_performance(circus::lmerMod_1)
       AIC      BIC R2_conditional R2_marginal ICC_adjusted ICC_conditional
1 71.59892 77.46187      0.6275509   0.5502792    0.1718216      0.07727173

From what I understand from this:

In that case, I would deduct that the random effects explain 62-55=7% of the variance of the outcome, which is itself 7/62=11% of the total variance explained by the model. However, the suggested values based on ICC would say "the proportion of the variance explained by the grouping structure" (the random?) is 17% (adjusted ICC), which appears as not coherent with the previous values. Moreover, as the ICC conditional is supposed to encompass both fixed and random effects, how can it be lower than the adjusted ICC?

strengejacke commented 5 years ago

We should look at the formulas closely. The major difference is that ICC has not the fixed effects variance in the denominator. From the "unexplained" variances, i.e. the variances not related to the fixed part, x% are explained by the random effects. R2 has the fixed effects variance in the denominator and refers to the total model variance.

Nakagawa only report the conditional ICC once in their paper. I'm not sure when this makes sense, but usually we want to know the adjusted ICC.

That's how I understand the difference.

DominiqueMakowski commented 5 years ago

But if the adjusted ICC can be inferred straightforwardly from the two R2s, isn't it a redundant index? And in that case, isn't the ratio between the explained variance by the random (ICC conditional = R2 cond - R2 marg = 0.077 in the above example) and the variance explained by the model (R2 cond = 0.62), in this case 12%, a more interesting relative index of the importance of the grouping structure to the model rather than to the outcome?

strengejacke commented 5 years ago

The adjusted ICC cannot be directly inferred from the r-squared values.

Originally, the ICC was only calculated for the NULL models, ie without fixed effects. And only for random intercept models. Thus there were only the residual and the random effects variation. ICC was the random effects variation divided by the sum of residual and random effects variation.

Later the ICC was extended to models with fixed effects and also with more complex random effects structures, that's why it's now more similar to an r-squared value, unlike the "original" ICC.

ICC and R2 are related, but still can't be derived from each other. The ICC relates to the random effects and to the between subject or between group variation. That's probably the leading distinction.

strengejacke commented 5 years ago

Quote from Nakagawa 2017

it is not surprising that the coefficient of determination R2 is a commonly reported statistic, because it represents the proportion of variance explained by a linear model. The intra-class correlation coefficient (ICC) is a related statistic that quantifies the proportion of variance explained by a grouping (random) factor in multilevel/hierarchical data.

DominiqueMakowski commented 5 years ago

@strengejacke I am truly sorry to annoy you with this particular index, but I am trying to get it right (and I reckon that other users might have the same questions), and reading Nakagawa over and over doesn't seem to help.

I understand that ICC and R2s are not computed in the same way and not statistically related. However, in all of the following models, the conditional ICC can be derived from the two R2s:

perf <- performance::model_performance(circus::lmerMod_1)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)

perf <- performance::model_performance(circus::merMod_1)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)

perf <- performance::model_performance(circus::merMod_2)
testthat::expect_equal(perf$R2_conditional-perf$R2_marginal, perf$ICC_conditional)

Is it because there are no random slopes in these models (only random intercepts?)

Anyway, about the ICC report, for the following model:

perf <- performance::model_performance(lme4::lmer(Sepal.Length ~ Sepal.Width + (1|Species), data=iris))
perf
       AIC      BIC R2_conditional R2_marginal ICC_adjusted ICC_conditional
1 202.6361 214.6787      0.8560323  0.09061311     0.841687       0.7654192

If I follow you, you would suggest a report along those lines:

"The total model explained a proportion of variance (R2 conditional) of .86. Within this model, the fixed factors alone (R2 marginal), and the grouping random factor (ICC adjusted), explained a proportion explained a proportion of variance of respectively .09 and .84"

is that correct?

strengejacke commented 5 years ago

Ok, let's try to summarize a bit. Maybe your example is a bit misleading, because you get the ICC results by adding / substracting R2-results. But I think this is just coincidence.

r2_marginal <- vars$var.fixef / (vars$var.fixef + vars$var.ranef + vars$var.resid)
r2_conditional <- (vars$var.fixef + vars$var.ranef) / (vars$var.fixef + vars$var.ranef + vars$var.resid)
icc_adjusted <- vars$var.ranef / (vars$var.ranef + vars$var.resid)
icc_conditional <- vars$var.ranef / (vars$var.fixef + vars$var.ranef + vars$var.resid)

The denominator for both R2 and conditional ICC are the same, but the nominator between R2 and ICC are different. That's why you usually should not be able to directly derive ICC values from R2 and vice versa.

Here's an example with a simple random intercept model and the "old" ICC-function, as well as the new approach from Nakagawa:

library(lme4)
data("sleepstudy")
m1 <- lme4::lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)

sjstats::icc(m1)
#>   ICC (Subject): 0.5893

sjstats::icc(m1, adjusted = TRUE)
#>    Adjusted ICC: 0.5893
#> Conditional ICC: 0.4244

performance::icc(m1)
#> $ICC_adjusted
#> [1] 0.5893089
#> 
#> $ICC_conditional
#> [1] 0.4243698

performance::r2_nakagawa(m1)
#> $R2_conditional
#> [1] 0.7042555
#> 
#> $R2_marginal
#> [1] 0.2798856

As you can see, adjusted ICC and "normal" ICC (i.e. random effects variance divided by random effects + residual variance) are identical (that's to be expected).

I think the difficulty is that the ICC refers to the ratio of the between-group-variance and sum of between-group and within-group variance, and thus only accounts for the random parts of the model. The residual variance we are talking here about is not comparable to a "classical" residual variance from a simple linear model, because the residual variance in mixed models (also) refers to the random effects. This is why it is difficult to have a r-squared measure for mixed models. I think, Nakagawa came from ICC, added the fixed effects variances to the formula to approximate something that can be considered as "R2"...

For binomial models, the "classical" and Nakagawa ICC are also identical:

library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial)

performance::icc(gm1)
#> $ICC_adjusted
#> [1] 0.1113561
#> 
#> $ICC_conditional
#> [1] 0.1018504

sjstats::icc(gm1)
#>   ICC (herd): 0.1114

sjstats::icc(gm1, adjusted = TRUE)
#>    Adjusted ICC: 0.1114
#> Conditional ICC: 0.1019

The adjusted ICC differs from the "classical" ICC for models with nested random effects and / or random slopes, and for models other than Gaussian or binomial, but this is just a sidenote.