florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
208 stars 22 forks source link

Interpretation of DHARMa output - GLMER (binomial) #375

Closed paulinepalma closed 1 year ago

paulinepalma commented 1 year ago

Hello,

I am trying to make sense of a DHARMa output I obtained after running a mixed effect logistic regression (glmer, from lme4). I have read the Vignette associated with the package but I am a little confused with the results I am getting.

For context, I am trying to predict accuracy (0 or 1) on an experimental psychology task, as a function of 3 predictors, which are all categorical (2 levels, coded as -.5 and .5). Pred1 is between-subject, Pred2 and Pred3 are within-subject.

Here is the model structure: model1 = glmer(accuracy ~ Pred1Pred2Pred3 + (1+Pred2*Pred3||participant) + (1|item), family = binomial, data=df, control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e4))) summary(model1)

Using DHARMa, I then plotted the simulated residuals: simulationOutput <- simulateResiduals(fittedModel = model1, plot = F) plot(simulationOutput, asFactor = T)

dharma1

The QQplot looks ok to me, but I see that the tests turn out significant. The boxplots for heteroskedasticity also do not seem very suspect to me, but there seem to be some issues. Note that I am running the model on almost 50,000 datapoints, and I see that having many datapoints may yield significant results on these tests. For instance, although the dispersion test is significant, the parameter is only 0.82 (obtained by running testDispersion(simulationOutput).

Given that I am fitting a glmer using the link "binomial", I also reran the outliers test using "bootstrap", which yielded the following output, suggesting no outlier: DHARMa bootstrapped outlier test

data: simulationOutput outliers at both margin(s) = 0, observations = 49361, p-value = 1 alternative hypothesis: two.sided percent confidence interval: 0 0 sample estimates: outlier frequency (expected: 0 ) 0 image

Following some advice on StackOverflow, I then tried plotting the residuals against the predicted values for each predictor.

Residuals vs. Pred1 image

Residuals vs. Pred2 image

Residuals vs. Pred3 image

Again, the boxplots look ok to me, but some of the tests performed seem to flag deviations.

I am now wondering if the complexity of the random effects structure may be partially responsible for the results?

Any insight would be welcome, thank you in advance!

florianhartig commented 1 year ago

Hello,

sorry for the late reply. About your question: I'm citing from the DHARMa vignette:

Once an residual effect is statistically significant, look the magnitude to decide if there is a problem: finally, it is crucial to note that significance is NOT a measures of the strength of the residual pattern, it is a measure of the signal/noise ratio, i.e. whether you are sure there is a pattern at all. Significance in hypothesis tests depends on at least 2 ingredients: strength of the signal, and the number of data points. If you have a lot of data points, residual diagnostics will nearly inevitably become significant, because having a perfectly fitting model is very unlikely. That, however, doesn’t necessarily mean that you need to change your model. The p-values confirm that there is a deviation from your null hypothesis. It is, however, in your discretion to decide whether this deviation is worth worrying about. For example, if you see a dispersion parameter of 1.01, I would not worry, even if the dispersion test is significant. A significant value of 5, however, is clearly a reason to move to a model that accounts for overdispersion.

So, regarding your plots: the tests are significant, but I don't see a strong deviation, so I would just ignore it.

florianhartig commented 1 year ago

OK, I will consider this closed for the moment, in case you have further questions feel free to re-open the issue!