florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Questions about dispersion tests #370

Open florianhartig opened 1 year ago

florianhartig commented 1 year ago

Questions via email:

In the help for testDispersion you say:

"if refit = F, the function uses testGeneric to compare the variance of the observed raw residuals (i.e. var(observed - predicted), displayed as a red line) against the variance of the simulated residuals (i.e. var(observed - simulated), histogram)."

  1. Should the text in bold not be "var(simulated-predicted)"?

  2. If so, there is a simply adjustment one can use to allow for the fact that we would expect the observed raw residuals to be smaller than the simulated raw residuals, since the predicted values were obtained by fitting the model to the observed data. If the hat-values (h_i) are available one can divide the observed raw residuals by \sqrt{1-h_i}, as we do when standardising Pearson residuals. If the hat-values are not available, one can often approximate them quite well by p/n, where p is the number of parameters and n is the sample size.

  3. You could also use this idea with the estimate of overdispersion based on the Pearson residuals. Thus you would compare the observed estimate of overdispersion with the simulated version of this estimate, but with the latter involving dividing the sum of squared Pearson residuals by n rather than n-p. This adjustment would allow such a comparison without refitting the model, and would therefore be a quick approximation to the option available using "refit=T".

  4. For mixed models one would obviously need to decide how to calculate p, as you mention for "type = "PearsonChisq", in your DHARMa vignette.

florianhartig commented 1 year ago

Hello, thanks for your questions, which I reply to below:

  1. Sorry, yes, this was a typo. Corrected

  2. This is an interesting idea, and I have to think about this. I would file this under the general problem of what a good test statistic for working on the simulated data. I have just advised a MSc thesis on this topic, and the student has done extensive simulations. From that, it seems that the problem of this test is not so much a lack of calibration, or bias of the test statistic (which you suggest), but power, but it may be so that the bias from the mechanism that you suggest (which makes sense to me, there should be a small bias) is not visible due to a lack of power.

  3. If I understand correctly what you want, this would be the same test as before, just with summing over Pearson residuals instead of Variance? I have considered this, the problem is that this is not general, because there are models that don't report Pearson residuals. I could approximate the variance by simulation, but this could fail if I have a response where all simulated values are, for example, zero.

  4. We have also made extensive simulations for this problem. The issue here is the unknown df of the chi2 distribution, which biases a "naive" Pearson chi2, as suggested in many books and on the web, towards underdispersion.