Open 1beb opened 7 years ago
Thanks @1beb for the feature request. I'm planning on adding an fst.rbind
method to the next version of the fst
package. This method will only need to read some meta-data from the existing file, so appending will be very fast as per your request. Note however that fst
uses a columnar binary file format. This means that added data will basically be stored as a separate chunk inside the 'fst' file format. This will have a marginal impact on performance when large chunks of data are appended. However, when many small chunks are added sequentially, the overall performance will suffer. A partial solution to this problem might be to define a fst.stream
class (issue #15) which can be used to append data to an existing file through an internal buffer. When the number of chunks is known, you can also use a fst.lapply
method to create a large on-disk data set from many smaller inputs (issue #18) (also to be developed) . This could also be done in parallel with a fst.parlapply
method.
Is it possible to append to an fst without having to load it (completely)?