we have a 10MB limit on messages in the replay kafka topic and we have gzip compression enabled on that topic

we did this to offload compression to MSK (probably.... it was a while ago 🙈) but the gzip compression is anyway done in the producer so nothing is offloaded to MSK

we thought something along the lines of "10MB compressed is 1MB so we're allowing folk to send us 100MB of data in one message and that won't happen"

but

since the producer gzips the data and might batch the messages before compression kafka checks (i think based on some googling) the message size limit before compression, so we're not allowing 100MB we're allowing 10MB

this seems to be true since 10MB files that are rejected are 10MB uncompressed

and

replay inlines css files, so ~30 times a day a posthog.com full snapshot + its css goes over this limit and gets dropped

so, it's not massively unusual for us to see messages >10MB

ok, so what?

we purposefully stopped splitting individual items "chunking" to fit them into kafka because it made the already very stateful mr blobby even more stateful

i really really don't want to go back to arbitrary chunking of replay message

really really

really

this pr

extend the config to specify whether to compress so we can choose between in replay capture or in replay capture's kafka producer
for extra safety that new config is only allowed (for now) for our token or if DEBUG is true
blobby now peeks at the kafka message buffer and uses the appropriate decompression if necessary

we'll probably get slightly worse compression overall since we'll now always be compressing individual messages instead of letting the producer (potentially) compress across several messages

but from the perspective of a 10.1MB uncompressed message that would have been dropped this is infinity better

in deciding what to do here i tested every possible option (all three of them)

option	average speed	average size reduction
protobuf	didn't check	slightly bigger*
msgpack	0.13s for ~10MB**	20% smaller
gzip	0.6s for ~10MB**	90% smaller

* i didn't write a protobuf schema, maybe i should have, but that felt like complexity i'd like to avoid ** average of operating with python on ~30 example files on my M3 MBP while running a tonne of electron apps and pycharm and a bunch of other things - so treat the timings as representative comparisons not predictions

tested gathering and playing recordings with the setting on and off

even though gzip is a chunk slower than msgpack the savings are so much larger that it's worth it especially since the instance is already spending the time running this compression

things i didn't do

start messing around with the kafka client to alter its behavior which i probably could do but feels like an excellent way to confuse everyone

PostHog / posthog

feat: using gzip by hand in the replay pipeline #23479

ok, so what?

things i didn't do

🔍 Existing Issues For Review