BoulderCodeHub / RWDataPlyr

R package to read and manipulate data from RiverWareTM
3 stars 5 forks source link

speed up read/write of txt files #27

Closed rabutler closed 7 years ago

rabutler commented 8 years ago

See https://github.com/BoulderCodeHub/Process-CRSS-Res/issues/31

rabutler commented 8 years ago

The txt files are really just a convenient intermediate step to save time by not making you re-read the rdf files every time.

The feather package will write/read this data extremely fast.

Per the feather blog post:

Feather is not designed for long-term data storage. At this time, we do not guarantee that the file format will be stable between versions. Instead, use Feather for quickly exchanging data between Python and R code, or for short-term storage of data frames as part of some analysis.

But that is fine, because we have the original data in rdf form, and the code can always be re-run to create the intermediate data.

For a 10.5 MB txt file created from getAndProcessAllSlots, the read/write for the text file are 0.90 and 1.73 s, respectively. For a feather file, it is .02 and .01 s, respectively.

rabutler commented 8 years ago

When next version of data.table is released, there will be a faster way to write the text files.

rabutler commented 7 years ago

the new version of data.table has fwrite available