Kevin-Jin / mmap

Forked from https://r-forge.r-project.org/scm/?group_id=648
1 stars 1 forks source link

Repurpose the package to also serve serialization needs #5

Open Kevin-Jin opened 7 years ago

Kevin-Jin commented 7 years ago

Since R does not have a great binary data format that facilitates random access reading of data.frame-like objects compared to other packages such as Stata, the size of the datasets that R can work with is limited by the random access memory capacity of the user's computer. My goal is to flesh out this project so that very large datasets can be loaded from and saved to a persistent store without any loss. This means that at the very least, the storage.mode and dim attributes should also be saved to a header section that precedes the data section of the file. Additionally, as.mmap(x, ...) should call typeof(x) and class(x) and keep track of this metadata in the file as well so that the file can be properly loaded without any other prior information.

The package should recognize the correct extractFUN and replaceFUN needed to serialize data.frames by default, as well as provide a way for users to add in custom serialization routines for data of different classes when passed to as.mmap(). S3 generics may be sufficient but at least some helper functions should be provided for performing the tedious steps.