dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.21k stars 908 forks source link

extend snapshot format with the vector record #486

Closed romange closed 1 year ago

romange commented 1 year ago

currently the rdb format uses pretty naive compression with its lzf* code. This wastes too much CPU and achieves little. In fact, when we save a snapshot using "SAVE DF" into local SSD we are bottlenecked on CPU and not on I/O.

We should introduce RDB_TYPE_VEC that is an uber record that "hosts" n records. Those n records will be serialized using the same compression context into a single blob. In fact 2 blobs: one for concatenated keys and one for concatenated values.

The record should answer the following requirements:

  1. its compressed blobs should be no bigger than kMaxCompress bytes. kMaxCompress is a manageable constant like 65KB.
  2. its uncompressed working buffer should have sane limits, i.e. no bigger than kMaxUncompressBytes. See https://github.com/romange/gaia/blob/master/util/coding/block_compressor.h#L10 on how we limit decompression resources, for example. The thing is that decompression algorithms reference the already decompressed buffer from before, therefore we should limit the reference window size during compression to avoid buffers too large.
  3. The compressed blob should be P percent and k bytes smaller than the total size of n uncompressed records. If it's not smaller then we should decompress and store all records as is without compression, similarly to how it's done here: https://github.com/dragonflydb/dragonfly/blob/main/src/server/rdb_save.cc#L648
  4. We should use zstd compression algorithm as the most balanced compression algorithm existing today.

We should start with a design document explaining how a RDB_TYPE_VEC record look like in serialized format, so it is possible to deserialize it back. We should compress keys and values separately because each one of them has different biases. We also should write the lengths for keys and values of all the records so that we could restore them from uncompressed blobs.

For simplicity, we can assume that we use RDB_TYPE_VEC only for gathering string types, though we should also serialize record types so that we could easily extend this to other types as well.

romange commented 1 year ago

Another constraint - we should only gather small key/value entries into this vector record. if an entry is large (say, bigger than 1KB), then it can be compressed separately, it has enough meat. This brings us to another point: single records should also have a different compression algorithm but this could be a separate issue.