apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.47k stars 452 forks source link

Proposal: add a new command for polling new updates by sequence #2469

Closed git-hulk closed 1 month ago

git-hulk commented 1 month ago

Search before asking

Motivation

Currently, RocksDB provides an API GetUpdatesSince to allow us to poll the write batched by the sequence number. And Kvrocks is now depending on this mechanism to implement the partial sync(PSYNC). Except for that, the official migration tool kvrocks2redis is also using it to fetch new updates after parsing the entire DB, but it requires running alongside the DB dir. In some scenarios like CDC(Change Stream Capture) also has this requirement, but it’ll be too troublesome if it requires running an agent alongside each Kvrocks node.

As far as I know, some users also have this similar requirement[1]. So I propose to add a new command for this purpose:

POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>]

And we also can extend more arguments like TIMEOUT/MIN, etc..

[1] https://www.revenuecat.com/blog/engineering/how-we-replicate-kvrocks-dataset/

Are you willing to submit a PR?

PragmaTwice commented 1 month ago

Good idea! Some comments:

git-hulk commented 1 month ago

What's the output (and output format) of this command?

My initial thought is to support the raw batch(hex format) first, and then support the optional argument FORMAT in the following PR.

How to get the sequence number (by commands)?

We now could get the sequence number from the INFO command, to see if it is necessary to add a dedicated command for this.

POLLING UPDATES seems a little weird, could it be something like POLLUPDATES?

Sure, POLLUPDATES is good since I cannot foresee any other behaviors except updates for now.

PragmaTwice commented 1 month ago

My initial thought is to support the raw batch(hex format) first, and then support the optional argument FORMAT in the following PR.

maybe we can add a RAW flag now? to make it extensible.

We now could get the sequence number from the INFO command, to see if it is necessary to add a dedicated command for this.

I think it's hard to use since users need to parse the output of INFO manually to get it. maybe a seperate command is better.

git-hulk commented 1 month ago

maybe we can add a RAW flag now? to make it extensible.

Sure, I have updated this.

I think it's hard to use since users need to parse the output of INFO manually to get it. maybe a seperate command is better.

What about adding a SEQUENCE command?