charlesdaniels / bitshuffle

BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Support for more compression types #2

Open charlesdaniels opened 6 years ago

charlesdaniels commented 6 years ago

Compression other than bz2 should be supported. We don't need to go overkill, but it's sufficiently easy to compress bytes() in Python that we may as well support some more. I would suggest maybe gzip, lz4, and an option to disable compression entirely (i.e. for input data that is already compressed).

When this feature is implemented, the compatibility level counter should be incremented.

I would make use of function pointers; i.e...


compress_data = None
if compressiontype is "bz2":
    compress_data = bz2.compress
elif compressiontype is "lz4"
    compress_data = ...
...

if compress_data is None:
    # crash the program with an error
    ...

...

compressed_data = compress_data(data, compressionlevel)

Note that we will need at least one wrapper function to "compress" data for the uncompressed type, and we also might need some for any compression functions that don't support compression levels (or don't do so as the second positional argument).

jyn514 commented 6 years ago

Added support for gzip in https://github.com/charlesdaniels/bitshuffle/commit/e37c44f0d7bd9bf21b9581129ff84fe922cc4879

charlesdaniels commented 6 years ago

This looks good, with the exception of a few nits I commented on in e37c44f.

jyn514 commented 6 years ago

Note: lzma (the compression behind .xz files) is supported natively only in python 3; to use it in python two a user would have to install lzma from backports, which involves compiling c code and is generally a pain.

I'm ok with only supporting lzma for python3 if you are.

Note also that lz4 is available through pip but not natively.

charlesdaniels commented 6 years ago

I say hold off on lzma for now then, unless you are strongly compelled to do so for some reason. Once Python 2 is deprecated in two more years, we'll turn this on. 

On Thu, 2018-02-22 at 02:38 +0000, Joshua Nelson wrote:

Note: lzma (the compression behind .xz files) is supported natively only in python 3; to use it in python two a user would have to install lzma from backports, which involves compiling c code and is generally a pain. I'm ok with only supporting lzma for python3 if you are. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.