florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
208 stars 22 forks source link

Interpreting DHARMa diagnostics for a binomial GLMM #396

Open JackGuo15 opened 9 months ago

JackGuo15 commented 9 months ago

Hi Florian,

I hope this message finds you well. I have read the very useful vignette you provided about the DHARMa packages. However, there are still a few points that I'm not certain about and I wish I could ask you directly for advice.

I built a binomial generalized linear mixed-effects model to analyze accuracy of a speech-transcription-in-noise tasks by non-native speakers of English. The model is listed below:

data_120wD_glmer_8 <- glmer( cbind(Score, n-Score) ~ Prime + scaled_Residence_Covariate_UK + scaled_Trial_Number+ Prime:scaled_Residence_Covariate_UK + Prime:scaled_Trial_Number + (scaled_Trial_Number|ID) + (scaled_Residence_Covariate_UK|Sentence), data = data_120_withDemographics, family = binomial(), control = glmerControl(optimizer = "bobyqa", optCtrl =list(maxfun=20e5)))

In this experiment, each participant transcribed 30 English sentences and each sentence has 4 keywords. Transcription accuracy is defined as the proportion of correctly transcribed keywords (in the model above, "Score" represents the number of correctly transcribed keywords for each sentence, "n" is the total number of keywords for each sentence, which is always 4). The primary predictor is "Prime", which is a categorical variable representing the type of visual stimuli a participant was exposed to during the transcription task. The other two predictors, "scaled_Residence_Covariate_UK " and "Trial_Number_Scaled" are covariates representing a participants' length of residence in the UK and trial number, respectively. The main purpose of the experiment is to investigate whether and how different types of visual stimuli (i.e., Prime) affect a participant's transcription accuracy.

I run a diagnostics of the model using the DHARMa package:

simulationOutput <- simulateResiduals(fittedModel = data_120wD_glmer_8, n = 1000, seed = 123) plot(simulationOutput)

And here are the results:

DHARMa Residual

There was a highly significant KS test, indicating problems regarding goodness-of-fit. The dispersion test is also approaching significance. I run a test for zero-inflation, but it was fine (p=0.678) (please see pic below):

Zero-Inflation

What confused me was that, in the Residual vs. predicted plot, I cannot eyeball any clear patterns. I just wonder what does the red solid and dotted lines in the middle of the plot mean? Does this flag a quantile deviation?

I also made the residual vs predicted plots for my predictors, as advised. And it seems that it were the covariate that were "causing problems":

Res vs  Pred by predictors

I just wonder if these problems are severe concerns (that this model should not be used...). I have 120 participants each providing 30 sentence transcriptions, thus 3600 observations. I'm not sure if this sample size is large enough that " even the slightest deviation will become significant". The q-q plot looks fine but the highly significant KS test and the red lines really concerned me.

Thank you so much for your time. I look forward to your responses.

Best wishes, Ruohan

florianhartig commented 9 months ago

Hello Ruohan,

the red lines are plotted instead of the quantile residuals for >1000 data points because of speed, see ?plotResiduals.

Your deviations, although significant, appear all to be very minor. I quote from the vignette:

  1. Once an residual effect is statistically significant, look the magnitude to decide if there is a problem: It is crucial to note that significance is NOT a measures of the strength of the residual pattern, it is a measure of the signal/noise ratio, i.e. whether you are sure there is a pattern at all. Significance in hypothesis tests depends on at least 2 ingredients: strength of the signal, and the number of data points. If you have a lot of data points, residual diagnostics will nearly inevitably become significant, because having a perfectly fitting model is very unlikely. That, however, doesn't necessarily mean that you need to change your model. The p-values confirm that there is a deviation from your null hypothesis. It is, however, in your discretion to decide whether this deviation is worth worrying about. For example, if you see a dispersion parameter of 1.01, I would not worry, even if the dispersion test is significant. A significant value of 5, however, is clearly a reason to move to a model that accounts for overdispersion.

In the plots you showed, I see no concerning patterns. That being said, see also the vignette, section about bionomial data https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#binomial-data

Best, F

JackGuo15 commented 9 months ago

Hi Florian,

Good afternoon! Thank you so much for your timely responses. This is incredibly helpful!

I've been reading your vignette more closely again, especially the part about binomial data, and have conducted some follow-up diagnostics on the model after grouping the data according to RE, etc. It seems that the model is fine.

Thank you again for your advice! I hope you have a nice holiday and happy new year!

Best wishes, Ruohan