florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
201 stars 21 forks source link

Need to order time input when using recalculateResiduals and testTemporalAutocorrelation? #359

Open jpourtois opened 1 year ago

jpourtois commented 1 year ago

Hi @florianhartig

When using recalculateResiduals, the output gives the scaled residuals but not the corresponding groups (time in my case). This might cause confusion because I think the aggregate function in recalculateResiduals returns the groups sorted in ascending order, while if I just provide unique(time) to testTemporalAutocorrelation (like many people might do), the groups will be in the order in which they appear in the dataframe (not necessarily in ascending order). Can you confirm that the residuals from recalculateResiduals are in ascending order and would it be possible to provide the corresponding groups in the output?

Thank you!

florianhartig commented 1 year ago

Hello Julie,

OK, the autocorrelation test will just use the numeric value of time, so indeed, you have to take care that you provide time in the order that it matches to the aggregated residuals.

The aggregation in recalculateResiduals internally uses the aggregate function of R,

 aggregateByGroup <- function(x) aggregate(x[sel], by = list(group[sel]), 
            FUN = aggregateBy)[, 2]

so the residual should be in the order of the variable 'group' that you provide, not ascending. I just checked, this works

testData = createData(sampleSize = 40, family = gaussian(), 
                      randomEffectVariance = 0)

fittedModel <- lm(observedResponse ~ Environment1, data = testData)
res = simulateResiduals(fittedModel)

res1 = recalculateResiduals(res, group = rep(1:2, each = 20))
res2 = recalculateResiduals(res, group = rep(1:2, each = 20))
res3 = recalculateResiduals(res, group = rep(2:1, each = 20))

res1$scaledResiduals == res2$scaledResiduals
res2$scaledResiduals == res3$scaledResiduals

I had forgotten about this, but the group variable is actually provided in the output already, so you can do

res1$group

which is not aggregated, but

unique(res1$group)
unique(res3$group)

should give you the right result.

The unique solution doesn't seem super save to me, but I also can't construct a case where this doesn't work. However, it would probably be safer to provide the group value of the aggregated residuals on top in the outputs, so I'll leave this open until fixed. .