ActorDB-geo - Githubissues

spog commented 7 years ago

While exploring your impressive project, I dared to think about a potential geo-replication scenario in ActorDB.

Basic idea is to have all data available at all locations:

All nodes active/writable
Maybe “eventualy” consistent locations
Potentially via per cluster geo-replication (see attached picture)

I thought such geo-scenario might work for ActorDB, since a particular actor (i.e. a mobile user) typically operates from one location at the time. However groups of mobile users might be active at different locations simultaneously.

I am more a system level programmer and I do not have a lot of knowledge regarding databases, so please excuse me if you find this post irrelevant.

SergejJurecko commented 7 years ago

Geo replication where all locations are active and serving data can be achieved by spreading out the individual clusters. So if 3 nodes in a cluster, place each in a separate location. This will of course mean writes will have a larger latency. But if you send queries to nodes where the user is closest, reads will be generally quick (unless safe flag is used, then it will have to verify with other nodes).

What you have in mind is another layer on top lets say locations. So we would have: nodes -> clusters -> locations

Some clusters would be active in one location and other locations would just be backups for that cluster. We were thinking about something similar.

The main question is what happens when a location goes offline. To make it work seamless is quite a difficult task. It also requires clients to have location reconnect logic in them. Writes must also replicate across at least two locations (so latency).

Eventually consistent locations mean when one goes offline and you get some writes to a new location, you may have just thrown away some writes from that offline location. You told the client write was complete, but if it has not managed to replicate to another location it may still disappear.

You either sacrifice latency or you sacrifice consistency. I guess the best option would be to allow both scenarios and leave it up to the user.

spog commented 7 years ago

Thank you very much for your response.

Would it help adding internal location information:

To each node of a cluster in case of the currently possible scenario of spreading 3 node clusters over 3 locations to automate using closest node for optimised reads?
To each cluster (same for all nodes of a cluster) in case of additional "locations" layer to help automate client reconnection logic?

A bold suggestion for location info:

location ID (i.e. 1, 2, 3)
connection priority order (i.e.: if client's closest is location 1, then next closest is 3 and next closest is 2, ... for all locations)

And regarding locations consistency I agree that this requirement is highly application dependent.

thanks again, Samo

SergejJurecko commented 7 years ago

One relevant point for this discussion is that if you have a 3 node cluster and all your client connections are to one node, replication leaders for actors will mostly live on that node. Since we use raft any node can either be a leader (executes SQL, reads and writes) or a follower (passively receives data from leader on writes).

spog commented 7 years ago

... so nodes would not be naturally load balanced?

SergejJurecko commented 7 years ago

The other option would mean 2/3 of queries would be proxied by the server you are talking to. Thus increasing latency and have a bigger chance of something failing.

catroot commented 7 years ago

To support cross DC replication we need extend the stripe of clusters with parity. Not looking to extra network latency, DC can have good uplinks and writes may be not very intensive and massive. There are another solutions for realtime data processing.

I like idea of DC redundancy - and here is my point of view. As i can see, in order to achieve a true DC redundancy and failover ActorDB need to support 1 thing:

Ability to rebalance shards (automated after configurable timeout to wait the lost cluster goes alive and manually forced - e.q. changing redundancy level or kick off a whole cluster if needed - e.q. 7 clusters setup can be shrinked to 5 clusters or 6 in case if N-2 redundancy will be required or something like this) This mean sightly different logic - the setup will be 3 cluster by 3 nodes per cluster distributed across different DC can deliver redundancy level enought to survive in case of failure at 1 DC + 1 node per alive DC. clusters must store data with parity like raid5 does, nodes operates in mirror mode as it is. if setup has only 2 clusters - the data will be simply striped between them, as it is for now.

With this we can change setup config on the fly - adding/removing clusters, their nodes and manipulate redundancy level from dangerous stripe to the N - any_level, achieving read speed improvement with required level of safety.

SergejJurecko commented 7 years ago

Thank you for the information. We will keep it in mind for future versions

biokoda / actordb

ActorDB-geo #35