Support compression on serialization of big values

dragonflydb / dragonfly

A modern replacement for Redis and Memcached

https://www.dragonflydb.io/

Other

25.34k stars 914 forks source link

Support compression on serialization of big values #3324

Closed kostasrim closed 2 months ago

kostasrim commented 2 months ago

Currently the protocol on rdb loader does not support compressed data and it breaks.

Previously, we only called FlushToSink in SaveEpilogue and before SaveBody (to flush previous entries). However, with the current changes, we call FlushToSink when we serialize X number of bytes. This doesn't work with the protocol implemented in rdb_load because we now split let's say a list in multiple chunks and LoadKeyValue on RdbLoader expects to read a string (for the values) but it gets a compressed blob and fails with unrecognized rdb type.

Extend the protocol to support loading compressed data in LoadKeyValuePair of the rdb_loader

romange commented 2 months ago

@kostasrim I suggest using RDB_OPCODE_DF_MASK for that. I't a meta opcode that can say in advance what to expect in keyvalue that is gonna be parsed next. Right now we only have DF_MASK_FLAG_STICKY bit, so you can extend with compression bits like DF_MASK_COMPRESSED_ZSTD or DF_MASK_COMPRESSED_LZ4 and then know on replica side to treat the value accordingly.

kostasrim commented 2 months ago

@romange LIT

chakaz commented 2 months ago

@kostasrim does that mean that, once https://github.com/dragonflydb/dragonfly/pull/3241 is merged, older versions of Dragonfly won't be able to read RDB/DFS saved with new Dragonfly versions? If so, we need to disable that for at least one version to allow rollbacks IMO

kostasrim commented 2 months ago

@chakaz #3241 won't affect compression, but this PR will. If we extend the protocol as suggested by Roman older versions won't work and will break

chakaz commented 2 months ago

For any such breaking changes, we should first disable (by default, toggle-able via a flag) the write path, and have the read path support it. It should be like that for at least 1 version, and only then we can enable it by default (or even remove the flag if makes sense)

adiholden commented 2 months ago

I synced with Shahar and he brought up that there should be an api for continues compression. I worked on this CompressorImpl class in the past, and I think we dont need to break changes. If we change the CompressorImpl class to create context and free the context once we finish full entry compression I believe this should be good. This is like breaking the compression to many frames that will be sent one after the other and with last frame once we finished serializing the entry

kostasrim commented 2 months ago

We decided to disable compression on big values and we will revisit this for optimization if needed in the future. I am closing this for now