florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
205 stars 22 forks source link

Residual pattern for bounded data / continuous proportions #221

Open mariam637 opened 3 years ago

mariam637 commented 3 years ago

I have conducted an experiment and am unsure how to go about the data analysis. I have a limited sample size (n=13). My experiments investigated the duration a subject looked in the 10 seconds following each of the two playbacks they were presented with. Each subject experienced four trials, two for each condition. To account for the repeated measures, I have been looking at using linear mixed models using the lme4 package including the random variable trial number nested within subject ID. I have also removed any trials where subjects did not respond to the stimuli.

Having gone through the model selection process, and plotted the final model there seems to be an issue concerning the homogeneity of variance at the lower bound – this is because of the non-continuous nature of my data set (data is limited between 0-10 seconds) and specifically because the number of 0 values (see Figure 1).

I have had the DHARMa package suggested to me and it did not identify any issues with the fit of my model to the data (see Figure 2). However, when I run the following code:

max.simResid<-simulateResiduals (m1, 1000) plot(max.simResid)

I always get the error message: DHARMa: fittedModel not in class of supported models. Absolutely no guarantee that this will work!

The DHARMa package lists lme4 as being compatible so am unsure why I am getting this error message.

My primary question is: Would anyone be able to advise me on why my lme4 model may not be supported by the DHARMa package? & Secondly: given the issue with the bounded nature of my data would a linear mixed model be appropriate?

Thank you very much for your time

Figure 1 Figure 2

florianhartig commented 3 years ago

Hi, your question 1 this seems to be the same issue as #220, right?

About question 2: yes, the pattern seems to come from the bounds on your data, this is to be expected given the nature of the response, which actually fits the assumptions of continuous proportions rather than a Gaussian response. The pattern comes out less clearly in the DHARMa plot, presumably due to the randomization of the REs, but even without looking at the residuals, it's clear that your response isn't Gaussian.

I don't find the pattern particularly concerning though, although one has to admit that it's certainly not random and could cause some effects at the lower values. If you want to address the issue, you could consider a GLS or move to regressions for continuous proportions such as the Beta. A while ago, I have made an overview, with no claim of completeness https://github.com/florianhartig/Statistics/blob/master/Examples/ProportionalData.md

The issue with the more complicated models is that, due the the nonlinear link, they are often more complicated and tend provoke errors of interpretation. Personally, I think the error you are making here with a lm is small enough to be negligible, but if a reviewer / statistician would look in detail at the residual, you would probably be more on the safe side when you apply one of the suggested solutions.

mariam637 commented 3 years ago

Thank you very much for your prompt response, yes the issue is the same as the previous question. I will look into what you have suggested.