Open ieQu1 opened 1 year ago
I assume one needed step is to periodically update the replicants. Maybe, each time the delta set tables rotate, the cores would broadcast the new logical timestamp to the replicants?
Also, I guess clock skews would come in play here in the sense that, although the timestamps would be just counters, each core node could diverge on which is the current timestamp. That is, each core might be at a different "time". So replicants might need to keep track of current timestamps per core rather than just per table.
Currently replicants have to copy the entire contents of the tables when they reconnect, even after a short while. With large enough volume of data it may hinder cluster recovery after disaster or maintenance. Moreover, it makes it almost impossible to rebalance the load on the core nodes.
Initially we tried to solve this problem by persisting the transaction log, so the replicants recovering after a reconnect could replay it instead of going through the entire bootstrap procedure.
That approach proved to hurt performance too much to be practical. In addition, flapping client connections can often generate
delete -> add -> delete -> ...
loops in the transaction logs, making them larger than the table itself, making the whole idea of replaying transaction log questionable.Below I describe an alternative approach. Instead of trying to avoid bootstrap, we could speed it up.
set
-like tables (plain ets or plain rocksdb) that store the following records:{{Table, Key}, X}
whereX
is a value of a counter.X
by 1.mria_rlog_server
process, as it processes intercepted transactions, writes each affected key to the table with the current delta set table. The existing keys are simply overwritten.clear_table
command (https://github.com/emqx/mria/blob/main/src/mria_bootstrapper.erl#L240), so the replicant preserves its local data.Pitfalls:
X
counter doesn't change during bootstrap, and only increments by 1 while jumping to the next table.