RDB store/load compressed data

RedisTimeSeries / RedisTimeSeries

Time Series data structure for Redis

https://redis.io/docs/stack/timeseries/

Other

1.01k stars 142 forks source link

RDB store/load compressed data #337

Closed gkorland closed 4 years ago

gkorland commented 4 years ago

Store data to disk in in compressed mode

K-Jo commented 4 years ago

@gkorland Can we discuss the pros and cons on this? @danni-m can you chime in?

danni-m commented 4 years ago

lets look at the current serialization: Cons:

takes more space
takes more time to load and dump Pros:
Forward and backward compatible
RDB is detached from compression algorithm, this means that we don't need to worry about backwards compatibility in the compression.
Similar to other Redis data structures

with compressed RDB: Cons:

we need to make sure the algorithm is backwards compatible for future versions Pros:
RDB size is lower when the chunk is full, but can be more expensive when chunks are not fully utilized.
faster RDB load/dump

gkorland commented 4 years ago

I think that especially if we want to support RoF we must make sure that load/store is fast and that the data on disk is not 20 times bigger than in memory

gkorland commented 4 years ago

As for backward, I think it's manageable with the right tests and RDB versions

ashtul commented 4 years ago

I have conducted a simplistic test of saving and reloading about 200MB in both RedisBloom which loads raw data and RedisTimeSeries which currently processes the data. Bloom was about 28 times faster (0.3 vs 8.4 seconds). While TimeSeries will take longers as it pushes the chunks into the dictionary so we can't expect to reach that speed, It will be within the same range.

ashtul commented 4 years ago

Checked a few more settings - The previous results (8.4 s) were for compressed floats. Uncompressed floats took 2.8 s. (x9) Compressed integers took about 30 s or 100 times longer than bloom.

knguyenphi commented 4 years ago

I'm using RedisTimeSeries to store IOT data (unix timestamp, integer value). My RDB file (uncompressed) is about 12 times my memory size (compressed). It would be wonderful if we can store/load compressed data straight to/from RDB (or maybe another file). Or having a mode setting in redis.conf where we can do this without backward compatibility. I can run an instance of redis just for timeseries data that don't need backward compatibility.