florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Does the direction of the deviations have any meanings #372

Closed tqdo closed 1 year ago

tqdo commented 1 year ago

Code:

set.seed(666)
library(DHARMa)
x1 = rbinom(1000,1,0.2)
x2 = rnorm(1000)
z = 1 + 2 * x1 + 3*x2 
pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr)
df = data.frame(y=y,x1=x1,x2=x2)


fittedModel = glm( y~x2,data=df,family="binomial")
simulationOutput <- simulateResiduals(fittedModel = fittedModel, plot = F)

plotResiduals(simulationOutput, x1)

Result:

Screenshot 2023-03-01 at 8 24 56 PM

The boxplot on the right shows a clear deviation from the uniform distribution. The deviation is upward. I wonder whether the direction of the deviation has any meaning? If the deviation is downward, does it mean any difference? Thanks a lot

florianhartig commented 1 year ago

Hello @tqdo,

yes, the scale on y tells you if data are higher / lower than expected. Given that you have an effect 2 * x1, but do not include x1 in the fit, plotting against x1 predictably shows that residuals are too low for x1=0 and too high for x1=1.

The labelling in your plot is confusing, you will see the normal labels when doing

plotResiduals(simulationOutput, form = x1, rank = F)

image

tqdo commented 1 year ago

Probably a silly question, with the positive coefficient (2) for x1, why does not including it make the residuals too low for x1=0 and too high for x1=1?

If my classification is imbalanced, does the direction of the deviations have any association with the imbalance? Thanks a lot