Open git-hulk opened 3 weeks ago
It seems related to the logical logging cc @mapleFU .
Could you elabrate the detailed design of the "event" field?
How to define a new event? e.g. for ZSET data type, could you list all event, for every write commands here?
Also the "subkey" is related to different encoding for each data type. If we store "subkey" here, it's too "low level" to be a recorder for structured semantics. Also could you list all "subkey"s for all ZSET commands?
IMHO we should have a quite clear mind about the layered (or, leveled) semantics of execution of Kvrocks:
Today we talk about the middle level operations.
This middle layer needs to have a clear design to meet three goals:
As you can see, this is a complex task. I hope we can have a clearer design: for example, for each redis data type, what is the detailed design of its middle-layer operator?
Then, we can talk about implementations.
By the way, would it better to making rdb have the same structure? Or they're totally different?
By the way, would it better to making rdb have the same structure? Or they're totally different?
rdb is for serialization of data, not for serialization of execution. so I don't think it's so suitable.
How to define a new event? e.g. for ZSET data type, could you list all event, for every write commands here?
It generally contains: ADD|DELETE|SET
. For example, the ZPOP command should generate a DELETE event.
If we store "subkey" here, it's too "low level" to be a recorder for structured semantics. Also could you list all "subkey"s for all ZSET commands?
What I mean by subkey here depends on the data type. For the Hash type, which we can see in the example, its subkey is the field f0
. And for the ZSet type, its subkey should be its member.
I don't quite understand the meaning of low level
here. Could you elaborate a bit about what you think?
Structured: The middle layer operators must have clear semantics, retaining high-level data structure information, rather than fragmenting into rocksdb reads and writes.
For the change stream event is now mapping to what's command changed instead of rocksdb read/writes. So that's why we have key
and subkey
.
As you can see, this is a complex task. I hope we can have a clearer design: for example, for each redis data type, what is the detailed design of its middle-layer operator?
Yes, I agree we should take care of the design before starting the implementation. We can reach a consensus about the high-level proposal, and then go into the details if all of us feel good about the proposal.
It generally contains: ADD|DELETE|SET. For example, the ZPOP command should generate a DELETE event.
How about commands like ZINTERSTORE, ZMPOP, ZPOPMIN? I'm still not sure if the current design is general enough. From my view it can easily become hard to model some complex scenarios.
e.g. for ZADD, the value should be member and score, but here you use just one std::string value
, how to encode? and we need to decode to retrieve the score. IMHO it sounds a weird design.
Also I think it's better to generalize it to logical logging instead of just for migrate/sync.
@PragmaTwice I have changed to value to the payload which is a union type.
My current point is, if we cannot use it to model all operations, e.g. ZREMRANGE..., we'd better to keep the current (redis commands) form to make the impl unified and let user easily to understand.
But we can continuely focus and figure out a better design.
Got your point. I agree that using the commands would make it easy to have a unified way to implement the change logs. My concern is that need to rewrite many commands like HINCR/HINCRY to HSET to simplify the downstream implementation.
I am not very familiar with the internal structure of kvrocks, nor am I very familiar with C++. However, I personally think that it is only necessary to connect to rocksdb itself. As for the need for RESP or which data structure in https://kvrocks.apache.org/community/data-structure-on-rocksdb needs to be connected may be related to the user's needs (simply put, when considering performance, one will be more inclined to directly connect to the data-structure-on-rocksdb or kv of rocksdb). Only the parsing method of business-class data structures needs to be provided for users to use. In addition, maybe kvrocks will have its own business protocol and data structures that redis does not have in the future. Directly connecting to the structure on rocksdb can avoid this risk. And even if one only wants to use it as redis, if this structure is fully compatible with redis, it can be freely converted through the RESP parser.
Search before asking
Motivation
Currently, the write batch extractor plays a key role in cluster migration and sync tool, and we would also like to introduce it to our change stream mechanism.
But it now might have those (potential)issues:
This was also mentioned in issue #2583. To improve this situation, I would like to propose a new design for the write batch extractor.
Solution
Introduce an intermediate representation format for the parsed format like
ChangeStreamEvent
and it will contain the following fields:VALUE | COMMADNS
ADD|DELETE|SET
if the event type is VALUEFor example, it will emit an event after receiving the command
HSET my_hash f0 v0
} :Then, the cluster migration/sync tool/poll updates can generate their own format based on the change stream event.
TODOs
WriteBatchExtractor
to allow to transform intoChangeStreamEvent
ChangeStreamEvent
WriteBatchExtractor
output with theChangeStreamEvent
What do you think? please don’t hesitate to leave comments if you have any thoughts or suggestions.
Are you willing to submit a PR?