ShindouMihou / RoseDB

A super simple, fast NoSQL database made in Java.
Apache License 2.0
10 stars 4 forks source link

Replication/Load-balancing #9

Open ShindouMihou opened 3 years ago

ShindouMihou commented 3 years ago

WARNING

This is a concept of art from someone who has took programming seriously starting a year or two ago, I have been writing code since early in my days but that was mainly for fun. I want to learn which is why I am leaving this concept here for others to improve on and also for myself to learn. Thank you.

Idea

Here is my general idea for the future which is to support load-balancing with the database, as I am still very immature at Java, feel free to give out your suggestions and opinions that could improve this (much better with code demonstration but that is purely optional).

Specification

What if the main process disconnects?

Protocol

Data Assignment to Node

More details to be added and this concept will slowly be improved over time, please note that THIS IS STILL A CONCEPT IDEA AND HAS PLENTY OF FLAWS.

ShindouMihou commented 3 years ago

Second Concept

A second concept has appeared! This time, it's a bit more simplified, we could go with a load-balancing route where one or two servers will be receiving and sending the requests to the servers who will be in charge. The load balancer will also be in charge of identifying which server is (x) data located through hash.

Diagram

Load Balancing(1)

Load Balancer's Purpose

The load balancer's purpose is to distribute the requests evenly between all the nodes whether it'd be for a simple GET request or an extra complicated aggregated filter request.

To further lower the load of the servers, we could also cache the responses but I doubt that would do any effect as all the items inside the server are already cached inside of them which means the process of retrieving the request is simply Load Balancer -> Node -> Internal Cache (if it exists) else -> Read from file.

Hash Identification

To help identify which server is responsible for which data, we could assign an identifier or otherwise hash between them. This, itself also has problems which will be written down below but to simplify, you can think of a hash as the 128 number being divided by x among of servers.

For example, we have 10 nodes with the value of the hash being at 128. All 10 nodes would be assigned a division which is simply a divisible number of 128 which is 12 numbers. Node 1 will be responsible for any data that is within 0-11 which is its responsible side, Node 2 will be responsible for 12-24 and so forth.

A request with an identifier of an item that has the key at division 0-11 will head to node 1 and so forth.

BUT WAIT, how are you going to do aggregation here?

Collection Partitioning This is where collection partitioning comes in which instead of distributing individual items among shards, we could place the key hash to a collection and the same concept kicks in.

BUT WAIT AGAIN, how will you do a database-wide aggregation or a database-wide filter?

If the host insists on item-wide distribution then we will respect their choice and have the application run at that configuration with aggregations for collection-wide handled by the load balancer who will ask for all the data from all servers and compile them into a single response that will be sent back to the client after all nodes responds, the same goes for collection-wide distribution.

What happens if a node dies?

Yes, this is also a concern of mine but if a node (which is basically a separate server) goes down then all the data inside of that node WILL GO UNAVAILABLE since the entire concept of load balancing is to balance the load between all the servers and having a primary server that will take in all the data and redistribute them when an entire node goes down doesn't seem like a proper load balancing concept for me. Feel free to leave your opinions over this but I am going to go with this route, you must get the node back up or the data over there will definitely be unavailable.

This is the end of the concept for now and will be improved on more time.