apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.47k stars 454 forks source link

Kvrocks & IngestExternalFile #2458

Open matan129 opened 1 month ago

matan129 commented 1 month ago

Search before asking

Motivation

Hi folks,

First of all - I just wanted to say that this is an awesome project 🙂

Secondly -

I wondered whether it's possible to load data to Kvrocks via RocksDB's IngestExternalFile.

The use case is real-world.

I currently work on a system that relies on (non-distributed) RocksDB, and we'd like to possible start using Kvrocks instead. Every once in a while, we use an offline, "bulky" Spark process which essentially generates a complete view of the RocksDB database. This is done by creating SST files directly, which is pretty cool*. The system then downloads these files locally and just points RocksDB to use them. This way, we can leverage Spark's super-scalable compute to create a dataset (of ~20B tiny records) which would otherwise take a long, long time to write to an empty RocksDB database.

Q: Since Kvrocks uses RocksDB as its backend, I wondered - how hard would it be to do something like this?

Thanks!


Solution

I assume that a solution would involve the following components:

  1. An offline library to create Kvrocks-compatible SSTs (i.e. conforming to this)
  2. A server API which can be given a list of SSTs to download and create a Kvrocks set from, using RockDB's IngestExternalFile.

Are you willing to submit a PR?

git-hulk commented 1 month ago

Hi @matan129 Thanks for raising this discussion.

Yes, some users also proposed to support ingesting extern files: #1301, #1628. And the solution what you have mentioned is correct to implement this feature. But AFAIK, no community volunteer is working on it for now. Welcome to contribute if you're willing to do that.