florianhartig / DHARMa

Diagnostics for HierArchical Regession Models
http://florianhartig.github.io/DHARMa/
208 stars 22 forks source link

Subscript out of bounds in DHARMa_0.4.4 with binomial mgcv::bam #323

Closed florianhartig closed 2 years ago

florianhartig commented 2 years ago

Error message from a DHARMa user:

Thank you so much for your DHARMa package and extremely helpful documentation on using it. I have successfully used it with glmer models and I am truly grateful that you went to such lengths to not only write the code but also provide solid guidance on using it.

I am writing to you today because I am trying to use DHARMa_0.4.4 with a binomial bam model from mgcv 1.8-38 and I get stuck on the residual simulation step:

sr1iB <- simulateResiduals(fittedModel=bm1iB) Error in as.matrix(out)[, seq(1, (2 * nsim), by = 2)] : subscript out of bounds

The exact same error occurs on both a Windows 10 and a Linux setup (with an older R installation)

R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

R version 4.0.3 (2020-10-10) Platform: aarch64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.5 (Ootpa)

I wonder if this has to do with mgcv/bam or with the fact that the model is huge (Rstudio reports "Large bam (56 elements, 4.6 GB). Perhaps R is running out of memory or something? The Linux setup is a 64-core cluster with 256GB RAM; I don't know if R can access all of that but the system reports plenty available after the model has been loaded: as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo",intern=T)) [1] 194688960 The Windows setup reports 11419.33 using memory.size()

The model is also fairly complicated, with crossed and nested random effects (12 conditions--indirectly modeling a 3x2x2 interaction--with 6 of them within participants and 6 within items). This is the formula for the model as fitted:

bm1iB <- bam( IA_Target_Looks ~ 1 + condfreflu + s(Time,by=condfreflu,k=14) + s(Time,Subject,bs="fs",m=1) + s(Time,Subject,by=condfre,bs="fs",m=1) + s(Time,Item,bs="fs",m=1) + s(Time,Item,by=condflu,bs="fs",m=1), discrete=TRUE, nthreads=60, data=FinalDat, family=binomial )

And this is the fitted model summary:

Family: binomial Link function: logit

Formula: IA_Target_Looks ~ 1 + condfreflu + s(Time, by = condfreflu, k = 14) + s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = condfre, bs = "fs", m = 1) + s(Time, Item, bs = "fs", m = 1) + s(Time, Item, by = condflu, bs = "fs", m = 1)

Parametric coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.2317 0.3211 10.063 < 2e-16 * condfreflunds.hiFreq.loFlu<2 -0.6503 0.4373 -1.487 0.13698
condfreflunds.loFreq.hiFlu>2 -0.6738 0.3796 -1.775 0.07589 .
condfreflunds.loFreq.loFlu<2 -1.1762 0.4286 -2.744 0.00607 * condfreflupds.hiFreq.hiFlu>2 -0.3116 0.3737 -0.834 0.40442
condfreflupds.hiFreq.loFlu<2 -0.6839 0.4177 -1.637 0.10160
condfreflupds.loFreq.hiFlu>2 -0.8892 0.3837 -2.318 0.02047

condfreflupds.loFreq.loFlu<2 -1.2416 0.4268 -2.909 0.00363
condfrefluwds.hiFreq.hiFlu>2 -2.3570 0.3388 -6.957 3.48e-12 condfrefluwds.hiFreq.loFlu<2 -2.3065 0.3808 -6.057 1.38e-09 condfrefluwds.loFreq.hiFlu>2 -2.2571 0.3599 -6.272 3.56e-10 condfrefluwds.loFreq.loFlu<2 -2.4811 0.3996 -6.209 5.33e-10

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms: edf Ref.df Chi.sq p-value
s(Time):condfreflunds.hiFreq.hiFlu>2 11.316 11.590 251.64 <2e-16 s(Time):condfreflunds.hiFreq.loFlu<2 8.166 8.875 94.25 <2e-16 s(Time):condfreflunds.loFreq.hiFlu>2 10.153 10.582 116.09 <2e-16 s(Time):condfreflunds.loFreq.loFlu<2 10.595 10.967 188.36 <2e-16 s(Time):condfreflupds.hiFreq.hiFlu>2 8.374 9.123 134.53 <2e-16 s(Time):condfreflupds.hiFreq.loFlu<2 9.392 9.977 125.49 <2e-16 s(Time):condfreflupds.loFreq.hiFlu>2 9.762 10.269 156.68 <2e-16 s(Time):condfreflupds.loFreq.loFlu<2 9.731 10.244 113.29 <2e-16 s(Time):condfrefluwds.hiFreq.hiFlu>2 8.440 9.240 74.16 <2e-16 s(Time):condfrefluwds.hiFreq.loFlu<2 9.445 10.075 77.50 <2e-16 s(Time):condfrefluwds.loFreq.hiFlu>2 6.433 7.461 92.69 <2e-16 s(Time):condfrefluwds.loFreq.loFlu<2 9.988 10.510 88.61 <2e-16 s(Time,Subject) 198.494 547.000 569.97 <2e-16 s(Time,Subject):condfrends.hiFreq 493.039 548.000 6013.19 <2e-16 s(Time,Subject):condfrends.loFreq 490.252 548.000 4928.30 <2e-16 s(Time,Subject):condfrepds.hiFreq 480.557 549.000 4581.40 <2e-16 s(Time,Subject):condfrepds.loFreq 484.135 548.000 4040.75 <2e-16 s(Time,Subject):condfrewds.hiFreq 466.911 547.000 3401.13 <2e-16 s(Time,Subject):condfrewds.loFreq 462.404 547.000 3073.73 <2e-16 s(Time,Item) 252.816 1069.000 423.58 <2e-16 s(Time,Item):condflunds.hiFlu>2 955.988 1070.000 10480.38 <2e-16 s(Time,Item):condflunds.loFlu<2 944.724 1070.000 9313.62 <2e-16 s(Time,Item):condflupds.hiFlu>2 939.411 1069.000 10560.84 <2e-16 s(Time,Item):condflupds.loFlu<2 947.249 1070.000 9146.60 <2e-16 s(Time,Item):condfluwds.hiFlu>2 929.400 1069.000 6831.23 <2e-16 s(Time,Item):condfluwds.loFlu<2 942.952 1069.000 7716.66 <2e-16

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) = 0.436 Deviance explained = 40.7% fREML = 2.8966e+06 Scale est. = 1 n = 336165

In case it makes a difference, the dependent variable is a 2-column representation of binomial counts (k,n-k):

str(FinalDat$IA_Target_Looks) int [1:336165, 1:2] 20 20 20 20 20 20 20 20 0 0 ...

I wonder if I am hitting upon a bug or an internal limit of the package, or if I should have done something different with the model. I would be grateful for any help on this. Please let me know if you need any additional information to determine the cause of the problem.

florianhartig commented 2 years ago

Hello,

I don't think this is a memory problem, but rather a DHARMa bug / limitation, possible duplicate with https://github.com/florianhartig/DHARMa/issues/12

If this is the problem, it should be solved with DHARMa 0.4.5 which I am still working on. To install the test version (which should go on CRAN in the next days) install the master branch according to the instructions here https://github.com/florianhartig/DHARMa

florianhartig commented 2 years ago

Confirmed, this is solved with 0.4.5