Open florianhartig opened 1 year ago
p-values are not effect sizes, but I suppose you could conclude that if you were to estimate an unknown overdispersion parameter, m1 would likely have a larger value than m2, all other things equal.
The question, however, is: why are you interested in this parameter. If the goal is to compare models m1,m2 (as you seem to suggest), you should not use this value. The reason is that, in general, goodness of fit statistics such as R2, overdispersion etc. are not suitable to compare between models, because more complex models generally fit better, i.e. the problem is that they do not correct for model complexity. Thus, DHARMa GOF tests primarily tell you if a models is compatible with the data. Based on your p-values, that is true for both models m1, m2, but they should not be used to compare!
For model selection, use tools like AUC or likelihood ratio tests. For mixed models, the problem here is that the df are often not clear, so you have to be careful. DHARMa has a function for a simulated likelihood ratio test https://rdrr.io/cran/DHARMa/man/simulateLRT.html that circumvents this problem also work for comparing models with different variance structures.
Your answer makes me realize my question was nonsensical.
I first had a look at goodness of fit (using AIC) but then started investigating dispersion and uniformity tests in isolation out of curiosity. Along the way, I somehow started using these tests values for model selection (beyond significant p-values indicating models not being compatible with data), instead of relying on goodness of fit measures.
If I understand correctly, the first sentence of your reply suggest that although p-value are not measures of effect size, they would reflect effect size ALL OTHER THINGS BEING EQUAL?
Via email: