florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
208 stars 22 forks source link

Outcome of testSpatialAutocorrelation() varies when groups are calculated differently #297

Closed florianhartig closed 3 years ago

florianhartig commented 3 years ago

Question from a DHARMa user via email:

I am using your DHARMa package to test for spatial autocorrelation in a glmm with a Bernoulli (0/1) outcome and 2 random effects, fit in lme4. I have come across some strange behavior depending on how I group the residuals. Do you have any insight into what is going on or which is the correct method? See code below.

Thank you, in advance!

f = surv ~ #tree survival, 1 or 0
  b.Tmin.cold.mo.s + #fixed effect
  (1|plotID) + (1|subplotID) ##subplot is nested within plot.  each plot has a unique LAT/LON.  Subplots are too small to have their own LAT/LON; they are assumed to be all at the same location (but are included as a random effect because of non-spatial differences).

 m <- glmer(data=g, formula=f, family=binomial)

  #### use DHARMa to simulate residuals and then test for spatial autocorrelation
  sims = simulateResiduals(m, re.form = NA)

  ### Since there are multiple trees per plot (or LAT/LON), aggregate residuals to plot-scale.  I have tried grouping the residuals in 2 ways -- by unique LAT/LON, and by plotID.  Both should give the same results, since there is only 1 plot per unique LAT/LON, and 1 LAT/LON per plot.  However, I get different results.

length(unique(g$plotID)) == length(unique(paste0(g$LON, g$LAT))) #TRUE

  ### based on help at ?testSpatialAutocorrelation

  ## grouping by plot ID value
  g$group<- g$plotID
  n <- recalculateResiduals(sims, group= g$group)
  head(n$scaledResiduals) #0.33317255884 0.87975731693 0.09780541524 0.97707141118 0.55704598031 0.12572975344
  x <- aggregate(g$LON, list(g$group), mean)$x
  head(x) #-121.598639 -121.829115 -121.865705 -121.946526 -121.941653 -121.877226
  y <- aggregate(g$LAT, list(g$group), mean)$x
  head(y) #44.028614 43.925862 43.878450 43.881224 43.657254 43.829008
  testSpatialAutocorrelation(n, x = x, y = y)$p.value ##0.5491952702

  ## grouping by LAT/LON
  g$group <- paste0(g$LON, g$LAT)
  n <- recalculateResiduals(sims, group= g$group)
  head(n$scaledResiduals) #0.02070558145 0.58241899684 0.45153058344 0.94244456277 0.94475363982 0.13718307545
  x <- aggregate(g$LON, list(g$group), mean)$x 
  head(x) #-119.275395 -120.549494 -120.554953 -120.622858 -120.627300 -120.654506
  y <- aggregate(g$LAT, list(g$group), mean)$x
  head(y)#44.069219 48.505825 48.583065 47.969537 48.535559 47.993852
  testSpatialAutocorrelation(n, x = x, y = y)$p.value ## 0.9551004096
florianhartig commented 3 years ago

I suspect that these differences are due to randomization. Note that DHARMa residuals for any integer-valued function involve a randomisation procedure to smoothen out the discrete nature of the data. The default procedure is the PIT procedure, see help. Thus, in principle, each DHARMa residuals calculation would lead to slightly different residuals. Because this led repeatedly to confusion among DHARMa users, I fixed the random seed so that that effectively, you will get the same result with each run you do (you can switch this off, see help)

However, if you change the order in which residuals are calculated, you are tricking this system, i.e. the randomization will be different because different random numbers are used for each residuals. I think this is what happens here, as from the coordinates, it seems that g$plotID has a different ordering than g$group.

florianhartig commented 3 years ago

You are correct that the two variables have different ordering. When I sort the plotID by lat/lon first, both versions give the same result.

This suggests that recalculateResiduals() has a random element (not simulateResiduals) -- is that correct?

It is troubling to me that different ordering gives quite different p-values. I am wondering if you recommend using a higher number of simulations in the original simulateResiduals call? Would this reduce the differences I get based on ordering?

Update: I tried upping the number of simulations in from the default (250) to 1k and 10k -- but I still get very different results depending on which variable I use to group the plots.

n=1000 based on plotID, p= 0.9324149278 based on lat/lon, p= 0.03198392638

n=10000 based on plotID, p=0.5529508359 based on lat/lon, p=0.9016349029

?simulateResiduals says that for n: "The smaller the number, the higher the stochastic error on the residuals. Also, for very small n, discretization artefacts can influence the tests. Default is 250, which is a relatively safe value. You can consider increasing to 1000 to stabilize the simulated values.".

Therefore, I thought I would see a reduction in the difference between the p-values. Let me know what you think.

Also, please let me know if you have advice on getting a result that is less sensitive to ordering.

florianhartig commented 3 years ago

Both recalculate and simulateResiduals have 2 random components:

  1. how many new data are simulated (can be changed with n in simulareResiduals)
  2. the randomisation of the observed data for discrete-valued distributions, for which, as I said, DHARMa implements 2 different options and PIT residuals are the default

You can make the stochasticity of 1) arbitrary small by increasing n in simulareResiduals. For continuous distributions, you will therefore get unique quantile residuals.

For discrete-valued distributions, there is the component of 2) which is independent of n and basically only dependent on the size of the data. The larger the data, the less stochasticity you have.

This is an inherent property of all randomized quantile residuals, Bayesian or frequentist, and not really a problem imo, as Type I error rates are still controlled as long as you don't "play around" with the randomisation, i.e. you shouldn't try out different randomisation and pick the one with the lowest / highes p-value, but rather take the first one and just roll with it. Yes, there will be some error in individual cases, but overall, this will deliver the right result. It's the same as running an experiment - if you would run it again, p-values would also differ.

florianhartig commented 3 years ago

p.s.: see also https://github.com/florianhartig/DHARMa/issues/38