Closed rhpvorderman closed 2 years ago
One possible improvement I see: when decompressing I see here a void *
is created which is later copied into a bytes object with PyBytes_FromStringAndSize.
But you can also do
PyObject * return_value = PyBytes_FromStringAndSize(NULL, decompressed_size);
void * decompressed_data = (void *)PyBytes_AS_STRING(return_value);
// Decompression set up here
libdeflate_gzip_decompress(
decompressor, data.buf, data.len, decompressed_data, size, &decompressed_size);
// error-handling code here
return return_value;
This way you only allocate a output buffer once for the bytes object. No copying required.
Thanks for the suggestion! It wasn't quite that simple, since decompressed_size isn't known before decompression, but there is a _PyBytes_Resize
function. Almost certainly better than a copy.
I wrote this to use in https://github.com/imsweb/pzip, which compresses (and encrypts) in chunks, so I had no need for a streaming interface -- libdeflate is very well suited for this case.
t wasn't quite that simple, since decompressed_size isn't known before decompression
Well it should be equal to the ISIZE block from the gzip trailer. Otherwise the gzip is corrupt. So you already initiate the buffer with the correct size. And the nice thing is that _PyBytes_Resize
quits early when the size is already correct. So no resizing happens in the correct case.
I wrote this to use in https://github.com/imsweb/pzip, which compresses (and encrypts) in chunks, so I had no need for a streaming interface -- libdeflate is very well suited for this case.
Ah very useful. Chunked compression is also used by a format in bioinformatics called BAM. It uses block gzip format, which is basically compressed blocks. The length of the compressed block is saved in the first EXTRA field, while the length of the decompressed result is saved in ISIZE. This is very useful as you know the exact sizes of the buffers.
I just got a notification (I comaintain the conda-feedstock for libdeflate) https://github.com/ebiggers/libdeflate/releases/tag/v1.9. FYI.
Hi, I work on python-isal, which wraps ISA-L. It also aims to accelerate compression/decompression and it supports streaming features.
Unfortunately ISA-L only works well on x86-64 (Intel, AMD) so it is much more limited than deflate in that respect.
Given that you probably work on this library because of some compression/decompression needs, I wanted to let you know about python-isal. Also I wanted to say hi, as another coder working on python bindings for a deflate-compatible compression library.