Closed redorff closed 6 months ago
I got a little scared by their multiple warnings about thread safety...
Unless stated otherwise, ZstdCompressor and ZstdDecompressor instances cannot be used for temporally overlapping (de)compression operations. i.e. if you start a (de)compression operation on an instance or a helper object derived from it, it isn’t safe to start another (de)compression operation from the same instance until the first one has finished. ZstdCompressor and ZstdDecompressor instances have no guarantees about thread safety. Do not operate on the same ZstdCompressor and ZstdDecompressor instance simultaneously from different threads. It is fine to have different threads call into a single instance, just not at the same time.
and as you probably want to have multiple thread on your webserver...
pyzstd is not warning about potential thread issues... but I admit it does not imply that there are no such issues...
AFAICT, regarding the warning:
if you start a (de)compression operation on an instance or a helper object derived from it, it isn't safe to start another (de)compression operation from the same instance until the first one has finished.
Even if flask-compress is used in a threaded context, only a single thread will ever process a single response, so as long as the ZstdDecompressor
object is not shared across threads, it should be fine. A similar pattern is used for gzip decompression.
So in a nutshell, looking at the compress
function relevant part:
elif algorithm == 'zstd':
# The zstandard.ZstdCompressor() should be created here, so it is not shared
For context, there is relevant information regarding the various zstd implementations here.
Yes. Ok.
Reusing a ZstdCompressor or ZstdDecompressor instance for multiple operations is faster than instantiating a new ZstdCompressor or ZstdDecompressor for each operation.
Would it also work to store the object once for good in self ?
elif algorithm == 'zstd':
if self.zstd is None:
self.zstd = zstandard.ZstdCompressor(level=app.config['COMPRESS_ZSTD_LEVEL'])
return self.zstd.compress(response.get_data())
As the Compress
object is global to the process (in most cases), I think this is what the warning is warning about 😉.
Flask-compress code code could add thread locking to prevent race conditions, but I would rather have a first version that is "obviously correct", then if this turns out to be slow we can look into further optimizations, like carefully re-using ZstdCompressor
objects.
I did some tests on my not-so-recent hardware, and I measure the creation time of ZstdCompressor()
to be around 900ns, which is not super-fast, but in the context of a HTTP webserver, I think sub-ms
overhead is reasonable.
Ok. I'll fix it as you suggested then. Do you know how to get rid of the failing test https://results.pre-commit.ci/run/github/142751498/1714036047.24xoLujVSeeGm64hBfSY7w ?
This pre-commit check is not part of the workflow file, so I think this must be something enabled at org level. @KelSolaar @MichaelMauderer perhaps you may help (I pinged you as I think you have the admin rights)
Thanks a lot! I will release soon.
Looks great! Any reason why pyzstd was chosen over zstandard? httpx landed the equivalent feature a few weeks ago and went with zstandard.