droundy / equilia

A fun database
1 stars 0 forks source link

sketch out an eventually consistent replication model with watermark #15

Open droundy opened 1 year ago

droundy commented 1 year ago

I think we could have one or two tables holding information about sharding and replication, including watermark information about how recent our data is from each server accepting inserts.

Probably need to have a received timestamp column on every table with a guarantee of monotonicity.

droundy commented 1 year ago

I'm thinking one table would hold the ip addresses and maybe public keys of all servers in the cluster, as well as a Uuid for each, and perhaps what roles each plays.

droundy commented 1 year ago

I'm also thinking the db (or each table?) could have a configuration for the duration of an "epoch" which would be the granularity with which a watermark is kept. Each server would set (and replicate) its epoch which would have all the changes inserted to that server that were received before that time in the server clock. Replication might proceed but epoch (with one transfer of inserts per epoch) and the db might be partitioned by epoch up until the watermark, which would be the epoch that has been received from all servers receiving inserts. I'm imagining that some servers might be read only, and never directly accept inserts from clients, and these servers wouldn't have to propagate their changes to other servers and wouldn't contribute to the watermark.

I'm imagining that there would be a query mode in which only changes up to the watermark are incorporated, so that clients could get a consistent result regardless of which server is queued, provided they check the watermark.

guerinoni commented 1 year ago

Are you imaging a cluster with a leader and other replicas attached? Or other architecture? Because with a leader, every write should pass throught it

droundy commented 1 year ago

Sorry I missed this last month!

I'm actually imagining a cluster with no leader, so inserts could go anywhere, including changes to the database schema. This obviously limits what changes are permitted.