archethic-foundation / archethic-node

Official Archethic Blockchain node, written in Elixir
GNU Affero General Public License v3.0
72 stars 22 forks source link

Support read quorum for P2P #332

Closed ghost closed 2 years ago

ghost commented 2 years ago

Problem to solve

In distributed system when we are dealing with several replicas sharding the same information, during the read time we want to avoid as much as possible stale data, to be up to date.

Current implementation request most of the time the nearest node but this can make consistency issues and retrieved not accurate data.

Solution

First proposal

We can implement a read quorum policy to query few nodes and except a set of nodes to respond accordingly to the policy.

Usually in distributed system the rule of quorum is: reads + writes > replicas

Where :

In Archethic's case, the writes are the confirmations coming from the atomic commitment from a validation node point of view, where the beacon chain and the client received a replication attestation. (subtree of the replication tree)

Then we can deduct a number reads from the election algorithm of the replicas and the validation nodes: reads = writes - replicas

But sometimes we also want to apply some acceptance rules from the client request, then we should also support extensibility by given an acceptance function for a set of replies (winner can be the longest chain, or the latest update)

Second proposal

While the first approach seems logical in most of distributed system and databases, as the number of replicas will be high, in term of performance the cost of consistent reads becomes high (can be N/2 or N/3).

Also the integration becomes more complex as we have to define conflict resolutions functions and setup to determine the number of replication attestations for each information. However sometimes we are requesting N request because there are not transaction validations for some queries.

Also because Archethic leverages atomic commitment we can use it in the queries as well. Then we can introduce atomic reads to be sure all the nodes requested have to return the same results otherwise we pick others or return failure.

With this approach we can also leverage the network coordinates to ensure the data loading from the nearest nodes

Remarks:

Problem with this approach is we can have huge amount of failures, as the consistency is not guarantee in a distributed system because of lateness

Third proposal

Another interesting approach is monotonic quorum reads where in 2 successive quorum reads, it’s guaranteed the 2nd one won’t get something older than the 1st one.

Integrations:

In all the cases, the function must be integrated in the following modules:

Finally we need to:

apoorv-2204 commented 2 years ago

What do you think about mnesia?

maybe I think: We dropped it because it uses table to store data if too big its fragmented?

ghost commented 2 years ago

I don't think Mnesia here will help, mnesia is a distributed database in erlang, here we just want to improve read queries for P2P to aggregate and perform a client choice on the answers, to avoid stale data

internet-zero commented 2 years ago

Hey team! Please add your planning poker estimate with ZenHub @apoorv-2204 @blackode @imnik11 @Neylix @roychowdhuryrohit-dev @samuel-uniris