Closed rabutler closed 7 years ago
The txt files are really just a convenient intermediate step to save time by not making you re-read the rdf files every time.
The feather package will write/read this data extremely fast.
Per the feather blog post:
Feather is not designed for long-term data storage. At this time, we do not guarantee that the file format will be stable between versions. Instead, use Feather for quickly exchanging data between Python and R code, or for short-term storage of data frames as part of some analysis.
But that is fine, because we have the original data in rdf form, and the code can always be re-run to create the intermediate data.
For a 10.5 MB txt file created from getAndProcessAllSlots
, the read/write for the text file are 0.90 and 1.73 s, respectively. For a feather file, it is .02 and .01 s, respectively.
When next version of data.table is released, there will be a faster way to write the text files.
the new version of data.table
has fwrite
available
See https://github.com/BoulderCodeHub/Process-CRSS-Res/issues/31