Closed melina-leite closed 1 month ago
Running
testData = createData(sampleSize = 100, family = gaussian(), fixedEffects = 1,
randomEffectVariance = 0, temporalAutocorrelation = 10)
fittedModel <- lm(observedResponse ~ Environment1, data = testData)
res = simulateResiduals(fittedModel)
testTemporalAutocorrelation(res, time = testData$time)
I don't see any differences when switching between
time = 1:sampleSize
time = sample.int(sampleSize)
in the simulate function
So, to me it seems the simulation with unordered time actually works fine, in which case we should switch to this as otherwise we'll indeed get a spurious correlation with the REs.
I think the problem happens when we have random effects together with the temporal autocorrelation. I did some tests to calculate the proportion of significative p.values for the temporal autocorrelation test for the 4 cases of simulation (100 simulations each):
Results for createData
(originally ordered):
temporal random.effect prop.sig
1 no no 4
2 yes no 100
3 no yes 66
4 yes yes 100
Results for `createdData2 (with the changes as presented above) temporal random.effect prop.sig 1 no no 9 2 yes no 100 3 no yes 4 4 yes yes 95
So the type I error for the original-ordered createData is higher when you don't have temporal autocorrelation but have some random effects, right?
I added the code of the simulations to the file 436.R
.
Yes sure, we have to have the time unordered, else it's correlated with the RE. So we want to set
time = sample.int(sampleSize)
But once that is set, I didn't see any problems, so I don't understand why we thought that is a problem yesterday.
Yesterday, I had a problem when I was doing the tests for the functions simulating data without temporal autocorrelation but still getting temporal autocorrelation. Then, the tests failed in the expectation.
I was not just time = sample.int(sampleSize)
that I changed (see first comment).
So, should I open a pull request with the changes in the function, and you review it? Or could I just commit to the main branch?
Not sure what would be the best in this case of changing a very very useful and used function.
Testing some data simulated by
createData
, I realized that even when we don't include temporal autocorrelation in the simulation, it can be present in the dataset:I suspected that it could be because the time column is ordered and the RE groups are also ordered. However, trying to shuffle the time column to have the time variable unclustered within group didn't solve the issue. I tried to change the
createData
function:And now, even with very strong temporal autocorrelation, the residuals don't show the pattern:
I'm puzzled. The code is in Code/DHARMaIssues/436.