Closed tqdo closed 1 year ago
The key to this is to understand that each single residual is essentially a p-value, and thus uniformly distributed under H0.
If we expect uniform distribution for EACH residual, we expect also that
This is what is essentially tested in the standard DHARMa plots - the left plot shows you the joint distribution of all residuals, the right plots shows residuals ordered against a predictor, and if we group residuals according to a particular value of the predictor, they should still be uniform (which is what the second statement you cite refers to)
thanks
Another related question if you don't mind:
My understanding from your answer is if the model is not fitted correctly, the residuals will not follow a uniform distribution. I did an experiment in which I intentionally omitted a feature that was used to generate y during training. What I observed was: the residuals are very non-uniform when plotted against that missing feature, but the residuals appear to be almost uniform when plotted against a random unrelated feature. This intuitively makes sense to me (the plot suggests that the missing feature can help us explain the response while the random unrelated feature has no value) but I don't get why the residuals would appear to be uniform for that unrelated random feature?
Code in R and plots
` set.seed(666) library(DHARMa)
x1 = rnorm(1000)
x2 = rnorm(1000)
z = 1 + 2x1 + 3x2 pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr)
df = data.frame(y=y,x1=x1,x2=x2)
fittedModel = glm( y~x1,data=df,family="binomial") simulationOutput <- simulateResiduals(fittedModel = fittedModel, plot = F)
plotResiduals(simulationOutput, x2)
plotResiduals(simulationOutput, runif(1000))
`
What I state is an implication for H0, so H0 => i.i.d.uniform residuals. From that, it does not follow that !H0 => not uniform, so uniform residuals are not a guarantee that the model is correct, but if you see non-uniformity, you know that that something is wrong. This is the reason why there are so many different plots / tests.
All this is, however, the same for all residual checks - in an OLS, you can also have a perfect QQ plot and then you see a pattern in residual ~ predictor.
So, what you are doing with the residual checks is to perform a number of sanity checks on your model, but that doesn't guarantee that it is correct.
See also the section on interpreting residuals in the vignette https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#interpreting-residuals-and-recognizing-misspecification-problems
I am reading the package's vignettes (this section) which explains how to interpret the residuals. Two things we look for in the residuals if the model is correctly specified are:
I understand the 1st point, but can't wrap my head around why the 2nd point is true. Really appreciate any help