add ZStandard compression

colour-science / flask-compress

Compress responses of your Flask application.

MIT License

117 stars 27 forks source link

add ZStandard compression #45

Closed redorff closed 6 months ago

alexprengere commented 6 months ago

Looks great! Any reason why pyzstd was chosen over zstandard? httpx landed the equivalent feature a few weeks ago and went with zstandard.

redorff commented 6 months ago

I got a little scared by their multiple warnings about thread safety...

Unless stated otherwise, ZstdCompressor and ZstdDecompressor instances cannot be used for temporally overlapping (de)compression operations. i.e. if you start a (de)compression operation on an instance or a helper object derived from it, it isn’t safe to start another (de)compression operation from the same instance until the first one has finished. ZstdCompressor and ZstdDecompressor instances have no guarantees about thread safety. Do not operate on the same ZstdCompressor and ZstdDecompressor instance simultaneously from different threads. It is fine to have different threads call into a single instance, just not at the same time.

and as you probably want to have multiple thread on your webserver...

pyzstd is not warning about potential thread issues... but I admit it does not imply that there are no such issues...

alexprengere commented 6 months ago

AFAICT, regarding the warning:

if you start a (de)compression operation on an instance or a helper object derived from it, it isn't safe to start another (de)compression operation from the same instance until the first one has finished.

Even if flask-compress is used in a threaded context, only a single thread will ever process a single response, so as long as the ZstdDecompressor object is not shared across threads, it should be fine. A similar pattern is used for gzip decompression.

So in a nutshell, looking at the compress function relevant part:

 elif algorithm == 'zstd':
     # The zstandard.ZstdCompressor() should be created here, so it is not shared

For context, there is relevant information regarding the various zstd implementations here.

redorff commented 6 months ago

Yes. Ok.

Reusing a ZstdCompressor or ZstdDecompressor instance for multiple operations is faster than instantiating a new ZstdCompressor or ZstdDecompressor for each operation.

Would it also work to store the object once for good in self ?

elif algorithm == 'zstd':
    if self.zstd is None:
        self.zstd = zstandard.ZstdCompressor(level=app.config['COMPRESS_ZSTD_LEVEL'])
    return self.zstd.compress(response.get_data())

alexprengere commented 6 months ago

As the Compress object is global to the process (in most cases), I think this is what the warning is warning about 😉. Flask-compress code code could add thread locking to prevent race conditions, but I would rather have a first version that is "obviously correct", then if this turns out to be slow we can look into further optimizations, like carefully re-using ZstdCompressor objects.

I did some tests on my not-so-recent hardware, and I measure the creation time of ZstdCompressor() to be around 900ns, which is not super-fast, but in the context of a HTTP webserver, I think sub-ms overhead is reasonable.

redorff commented 6 months ago

Ok. I'll fix it as you suggested then. Do you know how to get rid of the failing test https://results.pre-commit.ci/run/github/142751498/1714036047.24xoLujVSeeGm64hBfSY7w ?

alexprengere commented 6 months ago

This pre-commit check is not part of the workflow file, so I think this must be something enabled at org level. @KelSolaar @MichaelMauderer perhaps you may help (I pinged you as I think you have the admin rights)

alexprengere commented 6 months ago

Thanks a lot! I will release soon.