apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.38k stars 437 forks source link

Large .sst and .log files (unsure about compression feature) #2365

Open alija83 opened 1 month ago

alija83 commented 1 month ago

Search before asking

Motivation

I was playing with kvrocks (version 2.8.0) today and my goal was to activate compression and reduce logging. I am not sure if I have enabled configurations correctly but here is what I have noticed.

My goal was to use kvrocks and storing large volumes of KVs while ensuring that that data (KVs) remain compressed and uses least amount of disk.

Here is my configuration:

dir /kvrocks/data

# General
port 6376
bind 0.0.0.0

rocksdb.compression zstd

backup-dir /kvrocks/backup
#log-dir /kvrocks
log-level error
log-dir stdout
log-retention-days 0
pidfile /kvrocks/kvrocks.pid

I tried also the option lz4 on rocksdb.compression but nothing, the .sst file does not look compressed and the .log files are quite large as well. cd db /kvrocks/data/db # du -hs * 4.0K 000019.sst 624.0K 000605.sst 83.9M 002080.sst 70.4M 002082.log 102.6M 002083.sst 16.0G archive

cd archive ls -alsrht

64544 -rw-r--r-- 1 root root 63.0M Jun 14 20:59 002162.log 64448 -rw-r--r-- 1 root root 62.9M Jun 14 20:59 002169.log 64696 -rw-r--r-- 1 root root 63.2M Jun 14 20:59 002165.log

127.0.0.1:6376> keys '*' pattern 16521) ....

it has 16521 records.

16GB is just too much.

Solution

I am not sure on how it should work and be implemented, but what I expected as user is to see .sst files compressed and be able to disable logs if desired.

instead of seeing:

/kvrocks/data # du -hs * 22.5G db

it should have been something like: 25MB

Are you willing to submit a PR?

PragmaTwice commented 1 month ago

There are basically two points in this issue:

cc @git-hulk

git-hulk commented 1 month ago

for .log files, it's actually WAL (write ahead logging) in rocksdb, and you can configure it via some options like rocksdb.wal_ttl_seconds or rocksdb.wal_size_limit_mb.

Yes, you can reduce those two if you would like to keep fewer logs.