fstpackage / fst

Lightning Fast Serialization of Data Frames for R
http://www.fstpackage.org/fst/
GNU Affero General Public License v3.0
619 stars 41 forks source link

Will fst(s) be additive #24

Open 1beb opened 7 years ago

1beb commented 7 years ago

Is it possible to append to an fst without having to load it (completely)?

MarcusKlik commented 7 years ago

Thanks @1beb for the feature request. I'm planning on adding an fst.rbind method to the next version of the fst package. This method will only need to read some meta-data from the existing file, so appending will be very fast as per your request. Note however that fst uses a columnar binary file format. This means that added data will basically be stored as a separate chunk inside the 'fst' file format. This will have a marginal impact on performance when large chunks of data are appended. However, when many small chunks are added sequentially, the overall performance will suffer. A partial solution to this problem might be to define a fst.stream class (issue #15) which can be used to append data to an existing file through an internal buffer. When the number of chunks is known, you can also use a fst.lapply method to create a large on-disk data set from many smaller inputs (issue #18) (also to be developed) . This could also be done in parallel with a fst.parlapply method.