edwindj / ffbase

Basic (statistical) functionality for R package ff
github.com/edwindj/ffbase/wiki
35 stars 15 forks source link

Growing a ffdf data.frame on disk #44

Open kindlychung opened 9 years ago

kindlychung commented 9 years ago

Hi, Ed.

How do you start an empty ffdf data.frame on disk and then grow it row by row (or col by col)?

kindlychung commented 9 years ago

I posted a question on stackoverflow: http://stackoverflow.com/questions/30834967/grow-a-ffdf-data-frame-on-disk-gradually

There is a problem with updating ffdf data frame on disk.

edwindj commented 9 years ago

I tried to answer your question on stackoverflow!

kindlychung commented 9 years ago

Thanks!

ffiris = as.ffdf(iris)
save.ffdf(ffiris, dir = "~/Desktop/iris")
filename(ffiris) # show contents of ~/Desktop/iris

ffiris =transform(ffiris, new1 = 99) # this create a copy of the whole data.frame!
filename(ffiris)  

ffiris$new2 <- ff(rep(99, nrow(iris)))  # this creates a new column, but not yet in the right directory
filename(ffiris)

save.ffdf(ffiris, dir="~/Desktop/iris", overwrite=TRUE) # this fixes that.

In the last line, does save.ffdf overwrite all the existing vectors, or just add a new one to the ~/Desktop/iris folder?

edwindj commented 9 years ago

In the last line save.ffdf only adds a new one to the folder.

Well, that is: ffbase sets the filenames using ff::filename, for existing files this should not result in copying (accepts when file are on a different disk/volumn).

kindlychung commented 9 years ago

Cool. Thanks!