gallantlab / cottoncandy

sugar for s3
http://gallantlab.github.io/cottoncandy/
BSD 2-Clause "Simplified" License
33 stars 17 forks source link

numcodecs compression #60

Closed marklescroart closed 6 years ago

marklescroart commented 6 years ago

Compression w/ numcodecs is working, tho may be unkind to memory. As far as I can tell, numcodecs does not support compression of filestreams, just whole buffers, so I think there may be copying of data in RAM. Code needs review by ppl better versed in streams & file-like objecst than me. This will only be an issue if the numcodecs compression is used; I would take it, even with a hit to RAM, probably, because compression is important for big files (still not tested in production).

One keyword argument to upload_raw_array() has been changed: gzip=True is now compression='gzip', with other possibilities for compression. The code will still recognize and handle gzip=True with a deprecation warning.

Some compression algorithms in numcodecs (Zlib, LZ4) still fail with large (> 2GB) arrays. BZ2 is very slow, but compresses more (by default); Zstd seems best for tradeoff of speed of encoding / decoding and compression rate. These algorithms take parameters to trade off btw compression speed and % compression, but there is currently no way to specify those for cottoncandy (each just goes with its defaults). This could be added in the future, but defaults seem OK.

marklescroart commented 6 years ago

Merging in changes so PR can be evaluated in a branch of the main repo.