bwlewis / lz4

LZ4 Data Compression and Decompression for R
Other
10 stars 1 forks source link

RDS support? #2

Open eddelbuettel opened 8 years ago

eddelbuettel commented 8 years ago

Is there a way to shoehorn this into standard R serialization to file, and reading back in a way that is blazingly fast -- from a compressed file?

Did some casual experiments (checked by @joshuaulrich) to see the impact of compression/non-compression as the readr package now defaults to non-compression for RDS now. That gives it roughly 4x speed on read at 2x filesize (on some casual tests on my box, ditto for Josh).

I have the feeling we can do better, and I have been meaning to look into RDS file creation from the C(++) side anyway...

bwlewis commented 8 years ago

Yes and no. My immediate need was to store compressed (non-RDS) serialized R objects into Redis. This is what I do:

redisSet("key", lzCompress(serialize(some_object, NULL)))

One can of course direct the output to a file instead of Redis.

To get this to work with RDS, one approach would implement the lz4 streaming API (in issue #1) and then maybe an R connection object interface. None of that should be too difficult, and I plan to implement that someday...

However right now it is possible to save RDS objects using lz4 without even using the lz4 package but instead using a command line lz4 program. See the "Parallel compression" section of the help for R's save. This shows how to pipe save through arbitrary programs.

Note! That approach actually works very well because of some added parallelism introduced by the pipelining (overlapping R's serialization and lz4's compression).

eddelbuettel commented 8 years ago

Ohhh. Thanks for that pointer. Will do some timing.

eddelbuettel commented 8 years ago

On the sample date.frame (5 cols, 1e6 rows) reading from uncompressed is still fastest. lz4 helps but at the cost of a less compressed file.