florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Understanding uniformity of DHARMa residuals #363

Open tqdo opened 1 year ago

florianhartig commented 1 year ago

Seems this was closed and then extended in #365, which asks specifically about uniformity in res ~ pred plots. In any case, as I received a very similar question via email, I will hijack this issue and re-open - these are the questions via Email:

  • Is there any difference between the DHARMa RQR and Dunn and Smyth´s RQR?

  • Regarding Dunn and Smyth residuals, it seems that for a well-fitted model, the residual vs. predicted/adjusted values plot will present a random distribution of the residuals, centered on zero, with constant variance (since they follow a Normal distribution). With DHARMa residuals, since they vary between 0 and 1, the residual plot should be centered around a y-value of 0.5, right? Also, here the term "random distribution" should be replaced by "uniform distribution", right?

  • The DHARMa residuals are standardized between 0 and 1. This is resulting from the range values of the cumulative density function, or is there some step that I´m missing along the procedures of generating the residuals? Perhaps this is a difference regarding Dunn and Smyth residuals?

  • The QQ plot Residuals is built against an expected distribution. The uniform distribution, is that correct? Could you explain to me how the residuals should approximate this distribution and not the Normal distribution? I mean, I think I´m missing all the steps and logic/rational that is behind the uniform distribution, probably being a lack of statistical insight...

florianhartig commented 1 year ago

Is there any difference between the DHARMa RQR and Dunn and Smyth´s RQR?

The idea is the same. In detail, DHARMa has several options to calculate the randomisation, notably you can change refit = T/F and the quantile calculation (which includes the randomisation) via the method argument.

I tend to think that the current default procedure with refit = F and method = PIT is identical to what Dunn and Smyth propose in their paper, but should probably re-check this in detail.

image

Regarding Dunn and Smyth residuals, it seems that for a well-fitted model, the residual vs. predicted/adjusted values plot will present a random distribution of the residuals, centered on zero, with constant variance (since they follow a Normal distribution). With DHARMa residuals, since they vary between 0 and 1, the residual plot should be centered around a y-value of 0.5, right? Also, here the term "random distribution" should be replaced by "uniform distribution", right?

Random is for me just the umbrella term, and then there is random uniform and random normal.

About the distribution: once you have the ecdf, you can transform to any distribution. If you want to have normal instead of uniform, see examples in https://rdrr.io/cran/DHARMa/man/residuals.DHARMa.html

I have decided to choose uniformity for the plots / tests for reasons discussed here https://github.com/florianhartig/DHARMa/issues/39

The DHARMa residuals are standardized between 0 and 1. This is resulting from the range values of the cumulative density function, or is there some step that I´m missing along the procedures of generating the residuals? Perhaps this is a difference regarding Dunn and Smyth residuals?

No, this is just the property of the ecdf. Dunn and Smyth residuals are the same, they just transform the uniform to normal thereafter.

The QQ plot Residuals is built against an expected distribution. The uniform distribution, is that correct? Could you explain to me how the residuals should approximate this distribution and not the Normal distribution? I mean, I think I´m missing all the steps and logic/rational that is behind the uniform distribution, probably being a lack of statistical insight...

OK, this all goes back to the same misunderstanding. A quantile is something on the range of 0 - 1, thus quantile residuals are inherently uniform, also Dunn and Smyth residuals. Once the quantile is calculated, you can transform to other distributions.

Dunn and Smyth have decided to transform to normal afterwards, because they assumed people are more used to a normal distribution. If you transform to normal, a deviation from model assumptions will show up as a deviation from normality, thus you should test if the residuals are normal.

In DHARMa, I keep the residuals in their original uniform distributions (again, see reasons here https://github.com/florianhartig/DHARMa/issues/39). Thus, for DHARMa residuals, a deviation from H0 shows as a deviation from uniformity.

Again, if you prefer normal residuals, use https://rdrr.io/cran/DHARMa/man/residuals.DHARMa.html - note though that you have to write your own tests / plots then, or use plots designed for other normal residuals.