bcjaeger / r2glmm

An R package for computation of model R squared and semi-partial R squared (with confidence limits) in linear and generalized linear mixed models
16 stars 9 forks source link

cmp_R2 for crossed random effect models? #9

Open bbolker opened 4 years ago

bbolker commented 4 years ago

cmp_R2 has the signature cmp_R2(c=C, x=X, SigHat=SigHat, beta=beta, obsperclust = obsperclust, nclusts = nclusts, method = 'sgv'). I'm trying to apply it to a crossed random effects model fitted with gamm4 (I'm extracting the $mer component of the fitted model, so it's a merMod object).

I understand how to get the contrasts matrix; I'm getting SigHatwith the utility function below (extracted from the body of r2beta.lmerMod). I'm a little confused by obsperclust, nclusts, which would only seem to be unambiguously defined for a model with a single grouping variable.

I know I can go look at the papers to try to figure it out (and I will), but was hoping for a hint/some guidance ...


extract.merMod.cov <- function(model) {
    Z = lme4::getME(model, "Z")
    s2e = lme4::getME(model, "sigma")^2
    lam = lme4::getME(model, "Lambda")
    lamt = lme4::getME(model, "Lambdat")
    G = s2e * (lam %*% lamt)
    SigHat = Z %*% (G %*% Matrix::t(Z))
    Matrix::diag(SigHat) = Matrix::diag(SigHat) + s2e/model@resp$weights
    return(SigHat)
}
bcjaeger commented 4 years ago

Thanks, that is a good point. It is true that obsperclust and nclusts are ambiguous if there is more than one grouping variable. For method = 'sgv', cmp_R2 will estimate denominator degrees of freedom using nclusts and obsperclust. Do you have any trouble with crossed random effects using method = 'nsj' or method = 'kr'?

bbolker commented 3 years ago

Hi, I'm back. I've spent a lot more time on this stuff recently and have some questions and a fairly large number of comments. (You may notice that I forked the repo (here, primarily to work on gamm4 stuff (which is pretty ugly). If you have the time and interest and energy, I might ask if we can have an extended conversation in the venue of your choice (issues list, e-mail, Zoom??) about some of this stuff.

Many of my questions are here (markdown file on forked repo).

The bottom line: I am working with moderately complex mixed models (three different [partially crossed] random effects with 1-4 terms each, about 10 fixed effects, ~ 700 observations, fitted with lmer and gamm4). For a while I was using method='sgv' (because I didn't realize that method='kr' was an option) and getting reasonably sensible answers. I switched to 'kr' and am getting somewhat similar results, but with some curious (to me) characteristics.

Here are the results for lme4 and gamm4 models of the same basic structure (the gamm4 models also include a spatial autocorrelation term; I tried them both because the gamm4 r2 method is an ugly hack that I didn't entirely trust, computing the K-R values one at a time by brute-force fitting the reduced model and comparing).

rsq_comp

In particular, for the 'kr' values, the R^2 values have extremely wide CIs, and for mammals [purple/squares], the model R^2 is actually less than the largest partial R^2. I know the 'sgv' values are in some sense wrong (because they are picking the number of clusters somewhat arbitrarily from the last-ordered random effect grouping variable), so I feel like 'kr' would be preferable if it weren't somewhat odd.

I had some thoughts (follow the link to the questions if you're interested) about a "better" implementation of the SGV approach, but (1) I've already tried lots of factorial combinations of different things and wanted to post this without making it longer and more complicated, and (2) it was my impression that the K-R approach would generally be best, if applicable ... ? (it seems to be recommend by Jaeger et al 2017 ...)

I understand completely how hard it is derive sensible R^2 values for complex mixed models, and would be willing to write this off as "well, R^2s are weird sometimes", but would be comforted if you had any thoughts/wisdom to impart ...

bcjaeger commented 3 years ago

This was very clear and easy to get caught up on. Thank you! I have also encountered situations where method = 'kr' gives semi-partial values that are above the model R2, and I cannot give a satisfying answer to explain that occurrence. It seems that Kenward-Roger denominator degrees of freedom may be responsible for this, but it isn't clear to me why the denominator degrees of freedom become very small in some circumstances (it does seem to be highly dependent on the model covariance structure though).

Regarding your repo with questions, I realized that I never went back and updated the references in r2beta() (sorry!). I published a paper in 2018 that goes into some detail on r2beta() with method = 'sgv': https://www.tandfonline.com/doi/abs/10.1080/02664763.2018.1466869?journalCode=cjas20 I will open an issue shortly to remind myself to include this reference in an update. If the journal doesn't give you access, just let me know.

Regarding a longer conversation, that sounds like a great idea to me. Can we initiate some e-mails and then schedule a zoom call? You can reach me at bcjaeger@uab.edu