Closed kindlychung closed 9 years ago
object.size
is said to force the loading of all data on disk, so I also looked at the task manager:
# Starting 48M
require(ffbase)
ffiris = as.ffdf(iris)
# 53M
save.ffdf(ffiris, dir = "~/Desktop/iris", overwrite = TRUE)
for(x in paste("a", 1:4e2, sep = "")) {
ffiris[[x]] = ff(rep(314, nrow(iris)))
}
# 65M
save.ffdf(ffiris, dir="~/Desktop/iris", overwrite=TRUE)
dim(ffiris)
# 66M
iris2 = iris
for(x in paste("a", 1:4e2, sep = "")) {
iris2[, x] = rep(314, nrow(iris))
}
dim(iris2)
# 71M
iris3 = as.ram(ffiris)
dim(iris3)
# 77M
Your example is a biased because the iris dataset is very small (150 records). When the number of rows increases the in memory data.frame takes more memory (see code below).
However, the memory consumption is still considerable: I''m not sure why this is the case (I will try to find out, note that I'm not the author of ff
)
require(ffbase)
# note that iris is only 150 records, so overhead is bigger.
object.size(iris)
ffiris = as.ffdf(iris)
object.size(ffiris)
iris_big <- iris[sample(nrow(iris), 1e4, replace = TRUE), ]
object.size(iris_big)
ffiris_big <- as.ffdf(iris_big)
object.size(ffiris_big)
Here is an example:
Output in R:
The ffdf object takes almost twice as much RAM as the R data.frame. Why? Did I do something wrong?