Closed schuemie closed 10 years ago
Thanks for reporting! I will look into it coming week.
Best regards,
Edwin Op 3 jul. 2014 10:49 schreef "Martijn Schuemie" notifications@github.com:
When I run the example code provided for ffdfdply it throws an error
data(iris) ffiris <- as.ffdf(iris)
youraggregatorFUN <- function(x){ dup <- duplicated(x[c("Species", "Petal.Width")]) o <- order(x$Petal.Width) lowest_pw <- x[rev(o),][!dup,] highest_pw <- x[o,][!dup,] lowest_pw$group <- factor("lowest", levels=c("lowest", "highest")) highest_pw$group <- factor("highest", levels=c("lowest", "highest")) rbind(lowest_pw, highest_pw)} result <- ffdfdply( x = ffiris, split = ffiris$Species, FUN = function(x) youraggregatorFUN(x), BATCHBYTES = 5000, trace=TRUE)
Output:
2014-07-03 04:40:55, calculating split sizes 2014-07-03 04:40:55, building up split locations 2014-07-03 04:40:55, working on split 1/2, extracting data in RAM of 2 split elements, totalling, 0 GB, while max specified data specified using BATCHBYTES is 0 GB Error in ffindexorder(index, os$b) : cannot allocate memory block of size 67108864 Tb In addition: Warning message: In bbatch(length, BATCHBYTES/(recvalbytes + 2 * recindbytes)) : NAs introduced by coercion
I have the latest version of R, ff, and ffbase installed:
sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] ffbase_0.11.3 ff_2.2-13 bit_1.1-12
loaded via a namespace (and not attached): [1] fastmatch_1.0-4 tools_3.1.0
A potential cause is the amount of RAM in my system: 163790MB, which is probably more than most people have. On another machine with only 32GB of RAM but otherwise the same configuration the problem does not occur.
— Reply to this email directly or view it on GitHub https://github.com/edwindj/ffbase/issues/37.
Some more testing seems to confirm the problem is the available memory. If I start R with
Rgui --max-mem-size=50M
the example code runs just fine. I also found a similar problem with the ffdfindexget function. Running this (without restricting memory)
myVec = ff(1:5)
another = ff(10:14)
littleFrame = ffdf(myVec, another)
posVec = ff(c(2, 4), vmode = 'integer')
ffdfindexget(littleFrame, posVec)
generated the following error:
Error in if (any(B < 1)) stop("B too small") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In bbatch(n, as.integer(BATCHBYTES/theobytes)) : NAs introduced by coercion
Again, the problem goes away when I restrict the memory through the command line.
I managed to trace the problem to the bbatch function in the bit package, that attempts to convert B to an integer:
B <- as.integer(B)
but on my machine B is too big to fit in an integer, because in the function ffindexorder in ff:
ffindexordersize <- function (length, vmode, BATCHBYTES = getOption("ffmaxbytes"))
{
recvalbytes <- .rambytes[vmode]
recindbytes <- .rambytes["integer"]
bbatch(length, BATCHBYTES/(recvalbytes + 2 * recindbytes))
}
B is set to BATCHBYTES/(recvalbytes + 2 * recindbytes), and BATCHBYTES defaults to getOption("ffmaxbytes"), which on my machine is 85,873,131,520.
I now run all my code by starting with
options(ffmaxbytes = min(getOption("ffmaxbytes"),.Machine$integer.max * 12))
and that makes the problem go away. It still would be nice to solve the problem in the package, but I guess the right place to fix it would be in the bbatch function, which will work just fine if B is converted to a numeric instead of an integer. However, that's the bit package, not yours.
Sorry for bothering you!
When I run the example code provided for ffdfdply it throws an error
Output:
I have the latest version of R, ff, and ffbase installed:
A potential cause is the amount of RAM in my system: 163790MB, which is probably more than most people have. On another machine with only 32GB of RAM but otherwise the same configuration the problem does not occur.