florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Pearson residuals look different than DHARMa residuals for a glmmTMB NB with large number of zeros #336

Closed florianhartig closed 2 years ago

florianhartig commented 2 years ago

From a DHARMa user via email:

I wonder if the qualitative difference between these residual plots using 1) pearson residuals and 2) DHARMa is expected? (81% of the data are zeros: 128 of 158)

   nb2o2 <- glmmTMB (visit_apis ~ 
              b_c + 
              flareasilv * hvs.sum_300m +        
              flarea_manz * flareasilv +  
              flareasilv * org_cnv_spr +                    
              hvs.sum_300m * org_cnv_spr +  
              offset(Lfas) + (1 | site), 
              family = "nbinom2", data = fullz2F)

1) plot(fitted(nb2o2), residuals(nb2o2, type="pearson")) #bbolker script (StackOv 2021)

image

simulationOutput <- simulateResiduals(fittedModel = nb2o2, plot = T)

image

The one from DHARMa looks pretty good, while the one with Pearson residuals shows data concentrated down in the left. Which I believed was excess zeros (considering the histogram below), but using a zero inflated model makes no improvement (in AIC; Pearson residuals don't work for zi models..), and DHARMa test for it also showed no problems with zeros (plot below).

image

image

florianhartig commented 2 years ago

It's expected that Pearson residuals can show weird shapes, in particular when we get towards small values of count incidence, where Poisson / NB become strongly assymetric, see beginning of DHARMa vignette. It's a good example for why diagnosing misfit based on Pearson is problematic. You should ignore the pattern in the Pearson residuals.

The DHARMa residuals look OK. The increasing pattern is a bit weird, not sure why this occurs. Maybe plot res ~ predictors to explore.

With a NB, zero-inflation will not necessarily show up in the ZIP test (see comments in the help / vignette), therefore you should additionally test against a model with a ZIP term using a LRT or AIC/BIC. Given that you did this, and there was no improvement, I don't think you have a problem with zero-inflation. The mere fact that you have a lot of zeros doesn't mean that you need a ZIP model, zeros can easily also arise through low predicted incidence.