florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Why does DHARMa simulation output depend on the order of the original data? #329

Closed sambweber closed 2 years ago

sambweber commented 2 years ago

Hi, I have a probably naive question, but one that has been puzzling me today. When running simulateResiduals I've noticed that the results vary (sometimes non-trivially) depending on the row ordering in the original dataframe. Consider this simple example:

library(mgcv) 
library(DHARMa)
set.seed(10)

# Simulate some data
dat <- gamSim(1,n=50,dist="normal",scale=5)

# Fit a basic GAM and examine goodness of fit using DHARMa
m <- gam(y~s(x1)+s(x2)+s(x3),data=dat)
summary(m)
plot(simulateResiduals(m))
# The output is consistent no matter how many times you run this because DHARMa fixes the random seed

# Randomly shuffle the rows in the original data frame and refit the model
dat = dat[sample(1:nrow(dat)),]
m <- gam(y~s(x1)+s(x2)+s(x3),data=dat)
summary(m)
plot(simulateResiduals(m))
# The fitted parameters of the GAM model are always identical (as you would expect) 
# but the diagnostics and residual plots change every time this block is run.

The differences aren't that great in the toy example above, but in some real-world datasets I have been looking at you can affect the diagnostics quite significantly by re-ordering the original data used to fit the model. I'm sure there is a rational explanation, but it seems odd as it allows you to obtain a 'good fit' simply by shuffling your data a few times! Any insights into why this happens and what to do about it?

Thanks!

florianhartig commented 2 years ago

Hi Sam,

if you have a fixed random seed based on which you create random numbers, you always get the same random numbers, but if you then switch around the order in which those are assigned to the individual data points, the order of the data points matters for the residuals. Essentially, changing the order of data points is like a new random seed, as the random numbers are assigned differently to the residuals.

Obviously, you shouldn't shuffle the order of your data until you get a good fit, in the same way as you shouldn't change the seed to obtain a good fit in DHARMa, but I don't see a way to avoid that a user can do this.

Ideally, I would somehow find a way to avoid the randomness in the quantile residuals, but I have not yet found a way to do this without loosing other desirable properties of the residuals. See also #38

Best, Florian

sambweber commented 2 years ago

Thanks Florian. I wasn't sure how the randomization step was applied within DHARMa, but that makes perfect sense. It still leaves we wondering which iteration to 'trust' as the outcome depends on the order my observations happen to be left in after a bunch of processing steps. I guess there should be a way to pool the simulations over a few different row orders (or random seeds) to try and average out some of that stochasticity, although no idea how to implement that in practice :).

Best wishes, Sam

florianhartig commented 2 years ago

Hi Sam,

you can trust DHARMa in that it has controlled type I error rates (you will get a significant test only in 1/20 times for a correct model), and simulations show that the power to detect problems is also quite good. I would thus argue that the exact residual pattern is not so relevant, what matters is our error rate in conclusions about certain issues (e.g. is there a dispersion problem?), and here the effect of stochasticity is generally much smaller than in the visual pattern.

Moreover, note that there are two contributions to stochasticity: the simulations and the randomisation of your response. The simulation stochasticity can always be brought to zero by increasing nSim. In your case, you have a Gaussian distribution (no discrete response), so you could average away stochasticity by increasing nSim to zero. The only point where the randomisation really kicks in is for discrete distributions, in particular binomial or Poisson with a low number of counts.

As said, I have been thinking about how to average away this last bit of stochasticity, but the issue with combining several simulations is that the null expectations for the residuals / tests is unclear.

sambweber commented 2 years ago

Thanks Florian, all makes sense. The motivation for my question was that I got stuck in one of those 1/20 cases where I seemed to have a model that fit nicely and then added a simple data formatting step to my pipe and suddenly found I had a (marginally) significant quantile deviation. It took me a while to work out that it was a row ordering issue and that literally every other way I ordered the data everything checked out. DHARMa is a neat package and has become my go-to for model diagnostics.