MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #16703] possible bug in nlme package #6087

Closed MichaelChirico closed 4 years ago

MichaelChirico commented 4 years ago

Created attachment 2019 [details] RData file containing the data set of the example

I experience an error due to allocation memory (?) when running a certain model. It seems strange to me since the model I try to fit is quite easy..

CODE FOLLOWS

rm(list=ls()) options(contrasts=c("contr.treatment","contr.treatment")) set.seed(123)

you might indicate the directory where find mydb.RData

library(nlme) load(file="mydb.RData") myGls<-gls(model = occupanc∼hourFact, data=mydb,correlation = corARMA(p = 1))

# Error: cannot allocate vector of size 20.8 Gb # In addition: Warning messages: # 1: In corFactor.corAR1(object) : # NAs introduced by coercion to integer range # 2: In double(corD[["sumLenSq"]]) : # Reached total allocation of 4090Mb: see help(memory.size)


METADATA

MichaelChirico commented 4 years ago

(In reply to Giorgio Alfredo Spedicato from comment #0)

Created attachment 2019 [details]
RData file containing the data set of the example

I experience an error due to allocation memory (?) when running a certain
model. It seems strange to me since the model I try to fit is quite easy..

#CODE FOLLOWS

rm(list=ls())
options(contrasts=c("contr.treatment","contr.treatment"))
set.seed(123)
#you might indicate the directory where find mydb.RData
library(nlme)
load(file="mydb.RData")
myGls<-gls(model = occupancy~hourFact, data=mydb,correlation = corARMA(p =
1))

# Error: cannot allocate vector of size 20.8 Gb
# In addition: Warning messages:
#   1: In corFactor.corAR1(object) :
#   NAs introduced by coercion to integer range
# 2: In double(corD[["sumLenSq"]]) :
#   Reached total allocation of 4090Mb: see help(memory.size)

I can confirm the problem ... basically the R code wants to construct a numeric vector of length n^2 where n = nrow(mydb)∼= 52000 and so n^2 is larger than the largest integer... and the current C interface of nlme cannot deal with such long vectors even if your computer had more RAM.

However, what you probably really want instead of

myGls<-gls(occupanc∼hourFact,data=mydb, correlation = corARMA(p=1))

is

myGls <- gls(occupancy∼ hourFact, data = mydb, correlation = corAR1(form =∼ 1 | hourFact))

## or (equivalently): myGls2 <- gls(occupancy∼ hourFact, data = mydb, correlation = corARMA(form =∼ 1 | hourFact, p = 1))

These still take quite a bit of time and also produce a resulting object of size 930 Mega bytes... mainly because of a numeric vector of length 116344908 which contains 99.99% zeros and makes 99.2% of the object size.

The next version of nlme will give a fast error message in your case when n^2 > maximal-integer, but really I think there is a second "bug" :

I'm not familiar with the code and would guess that indeed this part of nlme should be rewritten.. to not construct objects of basically size O(n^2) ... In principle, using the Matrix package and its sparseVector() objects could help, but changing the code such as to use these on the C side is still a quite a bit of work.... more than I want to afford for this in the foreseeable future.

Contributions of course would be welcome..

I'm closing this ... as the next version of nlme will give an immediate error for this, but keep the underlying problem open as "wishlist" item.


METADATA

INCLUDED PATCH