I am trying to model tooth loss among human populations before modern surgery (i. e. extraction at the dentist :-) ). I have a empirical data-set with individuals and tooth-loss as binary variable per tooth-possition (i.e. canine, molars etc.). The most important predicting variable is the age of the respective individual.
In the first run of models, and disregarding the information on tooth-position, the data-set is reduced to the number of lost teeth vs. the number of still observable tooth positions (which vary) per individual.
A simple logistic regression (model 1) with binomial probability distribution reveals strong overdispersion:
fit <- glm(formula = cbind(lost_teeth, tooth_positions - lost_teeth) ~ age, family = binomial, data = data_tooth_loss)
AIC(fit)
2123.05
DHARMa::simulateResiduals(fittedModel = fit, plot = T)
If the individuals are added (model 2), the AIC improves substantially but now the plot shows strong underdispersion. Strangely, the dispersion test is not significant.
fit <- glm(formula = cbind(lost_teeth, tooth_positions - lost_teeth) ~ age + individual_no, family = binomial, data = data_tooth_loss)
AIC(fit)
920.4828
simulationOutput <- DHARMa::simulateResiduals(fittedModel = fit, plot = T)
DHARMa::testDispersion(simulationOutput)
DHARMa nonparametric dispersion test via sd of residuals fitted vs. simulated
data: simulationOutput
dispersion = 0.95687, p-value = 0.064
alternative hypothesis: two.sided
A betabinominal model (model 3) takes care of the overdispersion. I use the function betabin of the package aod for that. Despite the fact that DHARMa provides no built-in support for aod, thanks to the vignette, I was able to add a custom function for creating a DHARMa object. There is no overdispersion anymore and the plot of the residual looks fine in my view.
I now face the problem that model 2 has the lowest AIC but that model 3 appears far more elegant and does not suffer from underdispersion. In the vignette and, for example, here you state that underdispersion is less of an issue than overdispersion so I am not sure on what ground I can reject model 2in favour of model 3. Model 2seems clearly overfitted from looking at the figures but is there a measure to read this result from, especially as the dispersion test is not significant?
First of all: Many thanks for the great package!
I am trying to model tooth loss among human populations before modern surgery (i. e. extraction at the dentist :-) ). I have a empirical data-set with individuals and tooth-loss as binary variable per tooth-possition (i.e. canine, molars etc.). The most important predicting variable is the age of the respective individual.
In the first run of models, and disregarding the information on tooth-position, the data-set is reduced to the number of lost teeth vs. the number of still observable tooth positions (which vary) per individual. A simple logistic regression (
model 1
) with binomial probability distribution reveals strong overdispersion:If the individuals are added (
model 2
), the AIC improves substantially but now the plot shows strong underdispersion. Strangely, the dispersion test is not significant.A betabinominal model (
model 3
) takes care of the overdispersion. I use the functionbetabin
of the packageaod
for that. Despite the fact thatDHARMa
provides no built-in support foraod
, thanks to the vignette, I was able to add a custom function for creating aDHARMa
object. There is no overdispersion anymore and the plot of the residual looks fine in my view.I now face the problem that
model 2
has the lowest AIC but thatmodel 3
appears far more elegant and does not suffer from underdispersion. In the vignette and, for example, here you state that underdispersion is less of an issue than overdispersion so I am not sure on what ground I can rejectmodel 2
in favour ofmodel 3
.Model 2
seems clearly overfitted from looking at the figures but is there a measure to read this result from, especially as the dispersion test is not significant?Any advice is much appreciated!