florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
208 stars 22 forks source link

Residual (correlations?) in one residual vs. predictor plot, but not the other? #361

Closed nl101 closed 1 year ago

nl101 commented 1 year ago

Hello,

I've constructed this model for my binary response "empty" (predicting if a caught fish will have an empty stomach=1, or not=0) using 3 centered at zero continuous variables (Standard length of fish (SL), salinity (sal), and temperature C (temp)), and 2 categorical (year (CYR) and Station/location (Station)). The diagnostic plots seem to indicate a good model, but one plot and error message are confusing to me. I don't seem to get this error message with any other predictor variables and it seems one residual vs. predictor plot shows "well behaved" patterns while another seems to show some correlations? Does anything else stick out?

CYR, N=13 Station, N=90

Stations/locations of captured fish are also grouped within "zones" (N=4) and some have more Stations than others. Since a zone is kind of an arbitrary grouping variable (based on bathymetry) I opted to model with the Station variable instead, but maybe it has too many levels?

mod <- glmer(empty ~ center_sl + center_sal + center_temp + (1 | CYR) + (center_sl | Station), 
             family=binomial(link = "logit"), data = c_neb, 
             control=glmerControl(optimizer = "bobyqa"))

Not sure what this step is for, but I'm following another example and don't get an error for any other variable - I'm not sure what's going on here. Is it too many levels (90)?:

> plotResiduals(simulationOutput, c_neb$Station, xlab = "Station", main=NULL)
Error in out$uniformity$p.value[i] <- out$uniformity$details[[i]]$p.value : 
  replacement has length zero

Other variables:

plotResiduals(simulationOutput, c_neb$CYR, xlab = "Year", main=NULL)
plotResiduals(simulationOutput, c_neb$sal, xlab = "sal", main=NULL)
plotResiduals(simulationOutput, c_neb$temp, xlab = "temp", main=NULL)

year sal temp

"Good patterns": QQplot

"Correlated patterns": plotResiduals(simulationOutput, fitted(mod), xlab = "fitted(mod)") Resid  vs predict

florianhartig commented 1 year ago

Hello,

what's your total N? Yes, could indeed be that with N=90 Stations there are not enough data points per group to perform the tests that are run in the function.

The result of plotResiduals(simulationOutput, fitted(mod)) at the end can be disregarded, see #43.

Your other plots show dome slight deviations which one could choose to address, but I think it would be fair to argue for staying with your current model for the sake of parsimony.

Best, F

nl101 commented 1 year ago

Ok, thank you! Are the main strategies for addressing patterns in these plots:

  1. qqplot showing deviations? => try a different distribution
  2. Residual vs. predicted deviations? => remove or add variables to model, maybe transform a variable currently in model

I just find myself guessing what's best without any real understanding of what I'm doing.

Total N=677, this is how the number of obs. breaks down by station/location:

> table(c_neb$Station)

101 105 106 107 111 112 117 118 119 122 123 124 130 133 134 135 137 143 144 145 146 147 156 157 
  1   1   2   7   2   2   1   1   1   6  30   2   7  10   4  13   2   2   5   6  32   9   9   1 
158 159 167 168 169 171 172 173 174 175 176  20  21  22 224 225 226 227 229  23 237 239  24 240 
  5  17   1  50  12   4   2   5   2   2   5   5   3  15   5   8   5   3   1   7  13  15   4  22 
241 253 254 255 256 257 265 266 267 268 269 270 271 278 279 280 281 282 283 284 290 291 292 294 
 16   8   8   2   2   1   9  21  12   2   1   3   0  19  20  11   2   3   0   1   1   7   3   3 
301 302 303 304 312 313 315 323  40  54 609 618 619 621 622  65  67  68  70  71  73 
  2   4   3   4   6   2   1   2   2   3   2   0   1   1   4   3  27  14  62   7   5
nl101 commented 1 year ago

Except for the "0" obs in some stations, does it seem like there are enough obs. per station to use as a random intercept?

florianhartig commented 1 year ago

Hello,

yes, maybe low counts or zeros are the reason for the error message. I would need your data to investigate.

The patterns that are highlighted here are not distributional, they are misfit and variance problems, so what you could do is

1) play around with the linear predictor 2) possibly move to a variable dispersion model, e.g. beta binomial

However, as I said, the patterns are so mild that I don't think the improvement is worth the cost. So, I would interpret these plots as saying that the residuals are OK, and I would do nothing.

F

nl101 commented 1 year ago

Got it, thank you!