Open matan129 opened 1 month ago
Hi @matan129 Thanks for raising this discussion.
Yes, some users also proposed to support ingesting extern files: #1301, #1628. And the solution what you have mentioned is correct to implement this feature. But AFAIK, no community volunteer is working on it for now. Welcome to contribute if you're willing to do that.
Search before asking
Motivation
Hi folks,
First of all - I just wanted to say that this is an awesome project 🙂
Secondly -
I wondered whether it's possible to load data to Kvrocks via RocksDB's IngestExternalFile.
The use case is real-world.
I currently work on a system that relies on (non-distributed) RocksDB, and we'd like to possible start using Kvrocks instead. Every once in a while, we use an offline, "bulky" Spark process which essentially generates a complete view of the RocksDB database. This is done by creating SST files directly, which is pretty cool*. The system then downloads these files locally and just points RocksDB to use them. This way, we can leverage Spark's super-scalable compute to create a dataset (of ~20B tiny records) which would otherwise take a long, long time to write to an empty RocksDB database.
Q: Since Kvrocks uses RocksDB as its backend, I wondered - how hard would it be to do something like this?
Thanks!
Solution
I assume that a solution would involve the following components:
Are you willing to submit a PR?