florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
208 stars 22 forks source link

Dealing with non-uniform residuals in y-direction when plotted against predictor(s) #401

Open akhileshtayade opened 8 months ago

akhileshtayade commented 8 months ago

Hi Florian,

I have been trying to understand how to consider non-uniform residuals in y-direction against a predictor while fitting models in the context of low sample size.

I have fitted a Beta GLMM using glmmTMB to the data that consist of 120 observations from 60 groups; each group contribute two observations to the data. The model formula can be described as following:

fit.beta <- glmmTMB(response ~ X + Z + W + (1 | group), family = beta_family(link = "logit"), ...)

I then simulated fit.beta unconditionally for 1000 iterations (simulation output represented below assimulationOutput), and plotted residuals against fitted values as well as against each predictor. Residuals vs. predictor plots did not show any evidence of non-uniformity in y-direction (figures not shown) except for the predictor X:

Figure 1: (From left to right) First two plots are the result of plot(simulationOutput). The third one is generated by plotting residuals against (rank transformed) X and the fourth one is generated by plotting residuals directly against X.

From the last plot, I inferred two things:

  1. The predictor X has right-skewed distribution (which also became evident from the histogram of X).
  2. Addition of some non-linear term of X might help the model fit.

Therefore, I fitted another model fit.beta.sq to the data and simulated it unconditionally for 1000 iterations (simulation output represented below assimulationOutput.sqr):

fit.beta.sqr <- update(fit.beta, . ~ . + I(X^0.5))

Figure 2: (From left to right) Plots generated using simulationOutput.sq. The second and the fourth plot display residuals against untransformed X and X^2, respectively.

Adding non-linear term (square root of X) seemed to have resolved the issue at first glance, however, when I rechecked the residuals against remaining predictors, residuals against Z appeared non-uniform in y-direction:

Figure 3: (From left to right) First two plots are generated using simulationOutput and the remaining two plots are generated using simulationOutput.sqr. The second and the fourth plot display residuals against untransformed Z from simulationOutput and simulationOutput.sqr, respectively.

The model fit.beta.sqr also seemed to have been correctly specified:

Figure 4: Plot generated using plot(simulationOutput.sqr).

And both simulation outputs do not show any pattern when grouped according to group:

Figure 5: Grouped residuals from simulationOutput (left) and simulationOutput.sqr (right). Number of groups = 60.

Finally, both X and Z have right-skewed distribution:

Figure 6: Histogram of X and Z. Since only W changed across two observations of each group, histograms used repeated values of X and Z from the data i.e. $X = \{ x^1_1, x^1_2, x^2_1, x^2_2, \dots, x^i_1, x^i_2, \dots, x^{60}_1, x^{60}_2 \}, i \in \{ 1, \dots, 60 \}$.

Given this information, could you please make any suggestions as to what model improvements I should consider?

bbolker commented 8 months ago

The distribution of your predictor variables isn't (or shouldn't be) relevant.

As overkill, you could consider a model with response ~ poly(X, Z, W, degree = 2) + (1 | group); this will estimate the full second-order polynomial model with respect to all predictors (linear terms of X, Z, W, also quadratic terms in X, Z, W and all interactions).

Adding a square-root term to fix the model diagnostics is not crazy, but seems like it would make the model very hard to interpret.

Since W varies within group, you could try using (1+W|group) rather than (1|group).

florianhartig commented 8 months ago

You can definitely do what Ben suggests, but personally, I would be satisfied with your second model. It seems a clear improvement over the first model, and while there is still a bit of misfit there (mainly in the dispersion / distribution wrt z), this seems to me rather minor and I don't imagine that it would have strong effects on any of the relevant model outputs. See comments in the the vignette, section https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#general-remarks-on-interperting-residual-patterns-and-tests

A residual pattern does not indicate that the model is unusable: Conversely, while a significant pattern in the residuals indicates with good reliability that the observed data did likely not originate from the fitted model, this doesn’t necessarily imply that the inferential results from this wrong model are not usable. There are many situations in statistics where it is common practice to work with “wrong models”. For example, many statistical models used shrinkage estimators, which purposefully bias parameter estimates to certain values. Random effects are a special case of this. If DHARMa residuals for these estimators are calculated, they will often show a slight pattern in the residuals even if the model is correctly specified, and tests for this can get significant for large sample sizes. For this reason, DHARMa is excluding RE estimates in the predictions when plotting res ~ pred Another example is data that is missing at random (MAR). Because it is known that this phenomenon does not create a bias on the fixed effect estimates, is is common practice to fit this data with standard mixed models. Nevertheless, DHARMa recognizes that the observed data looks different than what would be expected from the model assumptions, and flags the model as problematic (see here).

Once an residual effect is statistically significant, look the magnitude to decide if there is a problem: finally, it is crucial to note that significance is NOT a measures of the strength of the residual pattern, it is a measure of the signal/noise ratio, i.e. whether you are sure there is a pattern at all. Significance in hypothesis tests depends on at least 2 ingredients: strength of the signal, and the number of data points. If you have a lot of data points, residual diagnostics will nearly inevitably become significant, because having a perfectly fitting model is very unlikely. That, however, doesn’t necessarily mean that you need to change your model. The p-values confirm that there is a deviation from your null hypothesis. It is, however, in your discretion to decide whether this deviation is worth worrying about. For example, if you see a dispersion parameter of 1.01, I would not worry, even if the dispersion test is significant. A significant value of 5, however, is clearly a reason to move to a model that accounts for overdispersion.

Important conclusion: DHARMa only flags a difference between the observed and expected data - the user has to decide whether this difference is actually a problem for the analysis!