florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
211 stars 22 forks source link

Comparison of overdispersion values in DHARMa with model selection results #217

Closed florianhartig closed 3 years ago

florianhartig commented 3 years ago

Question from a user

I am fitting some dose-response models to data from an experiment where we were trying to gauge the amount of time a mite could survive in cold storage (i.e. a fridge) as a phytosanitary measure. We had vials, each with 100s of mites, and stored them in the fridge from 0 - 14 days, and then counted the living (failure) and the dead (success). In addition, we replicated this treatment several times. The true motivation of this work is actually to show a workflow for how to fit these kinds of models and the potential pitfalls in them. A guide of sorts after Zuur et al. 2010.

For model fitting I am trying a few different candidate models to try and find the best fitting one - these include models with binomial and beta-binomial errors, as well as including random effects (replicates), using the glmmTMB package. To determine which model is 'best' I was intending to pick using AIC and parsimony, with the AIC being calculated via the AICcmodavg package (the values calculated by it stack up with those glmmTMB produces). The delta values suggest the beta-binomial models with mixed-effects are also best.

K AIC Delta_AIC AICWt Cum.Wt LL betabinomial logit Random Effect 9 1100.63 0.00 0.79 0.79 -541.31 betabinomial probit Random Effect 9 1103.37 2.74 0.20 0.99 -542.68 betabinomial cloglog Random Effect 9 1110.50 9.87 0.01 1.00 -546.25 betabinomial cloglog No Random Effect 8 1199.75 99.12 0.00 1.00 -591.870 betabinomial probit No Random Effect 8 1199.87 99.24 0.00 1.00 -591.94 betabinomial logit No Random Effect 8 1200.16 99.53 0.00 1.00 -592.08 binomial cloglog Random Effect 5 2169.39 1068.76 0.00 1.00 -1079.69 binomial probit Random Effect 5 2242.73 1142.10 0.00 1.00 -1116.36 binomial logit Random Effect 5 2243.70 1143.07 0.00 1.00 -1116.85 binomial cloglog No Random Effect 4 3617.61 2516.98 0.00 1.00 -1804.80 binomial probit No Random Effect 4 3640.74 2540.11 0.00 1.00 -1816.37 binomial logit No Random Effect 4 3641.53 2540.90 0.00 1.00 -1816.77

In addition, I am checking if each of the models exhibit overdispersion using DHARMa::testDispersion. Now here is the puzzle that I can't get my head around. The AIC values suggest that all of the beta-binomial models are significantly* better fits, with those that have mixed-effects being the best, as you can see from the substantial drop from the best performing binomial (2169) to the best worst performing beta-binomial model (1200), let alone the best (1100). However, when I run DHARMa::testDispersion() on these models it suggests that only the binomial models with no mixed-effects are exhibiting overdispersion (see highlighted ones).

From what I have read elsewhere - comparing AIC between binomial and beta-binomial counter parts can be used as a proxy for looking for overdisperson. Is this correct? If so, why is testDispersion suggesting only a couple of the binomial models are overdispersed? Is the drop in AIC not big enough?

In this situation, which model would you suggest using?

florianhartig commented 3 years ago

Hi,

this is expected behavior.

I think the misunderstanding here is the statistical definition of overdispersion. Overdispersion is not a thing that can be defined absolutely for a dataset, but it is defined with respect to the model that you fit. Overdispersion means that the data is more dispersed than expected under a particular model.

Thus, what the DHARMa residuals tell you is that

The DHARMa results are therefore fully identical to what the AIC tells you - the beta-binomial is preferable.

Possibly, the confusion comes from the fact that glmmTMB seem to define overdispersion wrt to a "base model", in this case the binomial, and then call the dispersion parameter in the beta-binomial "overdispersion parameter". However, note that if a beta-binomial has a dispersion parameter of 5, it means it is more dispersed than the binomial, not that the fitted beta-binomial is overdispersed (as the fitted model now has actually the correct dispersion).

This is not the first time that I had questions about this wrt glmmTMB output, maybe I'll suggest to the glmmTMB developers to re-consider the naming of the dispersion parameter (as it seems to confuse people).