adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
MIT License
3.96k stars 1.16k forks source link

Improving zlib / gzip functionality #6284

Open gamblor21 opened 2 years ago

gamblor21 commented 2 years ago

Issue to track improvements that can be done to the zlib library that now has preliminary support:

gamblor21 commented 1 year ago

Just saw and worth noting https://github.com/micropython/micropython/pull/11879 is working on adding compression.

jimmo commented 1 year ago

@gamblor21 Some notes based on our experiences with this in MicroPython:

Implement decompressing data streams via the zlib.decompressobj style function like in CPython.

The API for {decompress,compress}obj seems backwards to how I imagine most people want to use this (i.e. reading/writing compressed data to a stream/socket/file). I'm not convinced there's any value in implementing these for a microcontroller target.

GzipFile provides a much better interface.

Unfortunately GzipFile does not provide anything other than gzip compression, and implicitly uses the highest possible value of the window size. So providing the useful functionality while matching CPython exactly seems impossible without some compromises.

Expand zlib with other functions like crc32 similar to PR #1274 was looking to do.

We already provide binascii.crc32 instead (and it appears that CircuitPython does too).

Fortunately zlib.crc32 and binascii.crc32 share the same method signature so it would be very easy to just add the existing method to the zlib globals dict.

Because we're looking at now providing our zlib module in Python, it can just forward the binascii version.

Expand for compression functionality. Compression does not exist in the used uzlib library.

See https://github.com/micropython/micropython/pull/11879#issuecomment-1608959558 in particular for details about Damien's lz77 compressor implementation that we've added to our fork of uzlib.

Implement the gzip CPython library (in CPython this is done in pure python so could be done the same).

Unfortunately I don't think this can be done efficiently with just the API provided by zlib.

gamblor21 commented 9 months ago

From the MicroPython 1.21 release notes:

...the zlib C module has been removed and replaced with a new MicroPython-specific deflate module and DeflateIO class that is optimised to provide efficient streaming compression and decompression. The zlib (and gzip) modules are now implemented in pure Python on top of the deflate module.

Here as a note for myself, or others that may take this on after CP merges with MP 1.21 but this may also result in a space savings by moving some functionality to a python library. (Also see jimmo's comment about that talks about this in more detail).