florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
200 stars 21 forks source link

Overdispersion in a beta regression? #376

Closed andres-v-s closed 1 year ago

andres-v-s commented 1 year ago

I'm trying to run a beta regression to predict my dependent variable Consistency, which has values between 0 and 1. Here is the distribution of Consistency values in my dataset:

image

I originally tried linear mixed models with different transformations, but none of them seemed satisfactory (none of the Q-Q plots looked good), so I decided to try a beta regression, using the following command:

l_cons <- glmmTMB(as.formula(paste(var_predicted, ' ~ ', paste(effets_model, collapse = ' + '), '+ ', nom_random, sep = '')), data = d, family=beta_family()) (the model is quite massive with a huge number of predictor variables and interactions, hence this construction rather than adding each predictor variable manually).

Now, I wanted to test if overdispersion might be a problem. First of all, by calling summary(l_cons)

I get Data: d AIC BIC logLik deviance df.resid -6861.3 -6465.1 3499.7 -6999.3 2235 Random effects: Conditional model: Groups Name Variance Std.Dev. Sujet (Intercept) 0.04931 0.2221 Number of obs: 2304, groups: Sujet, 72 Dispersion parameter for beta family (): 26.5

Moreover by calling (from the DHARMa package): simulateResiduals(l_cons, plot = T)

I get the following output: image

as well as testDispersion(simulateResiduals(l_cons, plot = T))

with the following output:

image

Can somebody explain to me what this means? Does it mean that the distribution of my data is altogether unfit for a beta regression? Are there some transformations that should be done? If the conclusion from this information is that there is indeed overdispersion, what does that mean exactly?

florianhartig commented 1 year ago

Yes, it looks as if it's not following a beta either.

My intuition is that this is not so much a dispersion problem than a problem if missing predictors, i.e. if have strong effects in the data. In that case, it will be very difficult to find a distribution that fits.

Either you just accept the misfit, or use a quantile regression (e.g. qgam).

florianhartig commented 1 year ago

OK, I will consider this closed for the moment, in case you have further questions feel free to re-open the issue!