archethic-foundation / archethic-node

Official Archethic Blockchain node, written in Elixir
GNU Affero General Public License v3.0
71 stars 22 forks source link

Update validation node election to handle bandwidth #908

Open Neylix opened 1 year ago

Neylix commented 1 year ago

Is your feature request related to a problem?

Actually the validation nodes election is based on daily shared secret and geo patch. If a node with a slow internet connection has to validate a full sized transaction (3MB) it will need to send this transaction to it's replication shard. This replication shard can be up to 66 nodes. If the node has 1Mb/s in upload speed it will take 24 sec to send the message to 1 node, for 66 nodes it would take 26 minutes. This without taking into account that the node can validate many transaction at the same time.

Describe the solution you'd like

With the implementation of the network patch, we will know the bandwidth of a node (the last digit of the network patch represent the bandwidth). This NP will be integrated in the beacon summary (so a fixed value and common for all node). We can adapt the validation node election based on the transaction size and the bandwidth of the nodes.

(The next calculation assume we use a speed connection in Mb/s as the mainstream speedtest tools but we could use bytes/s to simplify calculation and size conversion)

Let's assume this bandwidth digits calculation (expressed in Mega bits per second): 0 : < 1Mb/s 1 : < 5 Mb/s 2 : < 10 Mb/s 3 : < 20 Mb/s ... E : < 1000 Mb/s F : >= 1000 Mb/s

We can implement this calculation algorithm to determine the needed upload speed:

tx_size = Transaction.serialyze(tx) |> byte_size()
#  Calculate size in Mega bits over bytes
tx_size = (tx_size / (1024 * 1024)) * 8
total_size_for_replication_message = tx_size * nb_replication_node_per_shard
# Let's assume we wants all the messages sent in 5 seconds
single_tx_upload_speed = total_size_for_replication_message / 5
# Let's assume a node validate an average of 3 big tx at the same time
needed_upload_speed = single_tx_upload_speed * 3

For a 3MB transaction and full number of replication nodes the calculation would be :

total_size_for_replication_message = (3 * 8) * 66
# 1584
needed_upload_speed = (1586 / 5) * 3
# 950 Mb/s

So for a 3 MB transaction, only nodes with E or F bandwidth digits would be selected to validate this transaction.

For a simple transfer transaction (size average of 700 bytes)

total_size_for_replication_message = (0.0007 * 8) * 66
# 0.3696
needed_upload_speed = (0.3696 / 5) * 3
# 0.22 Mb/s

In this case all nodes can handle a simple transaction.

Additional context

For a transaction of 3 Mb, it require a lot of bandwidth to send it to 66 replication nodes (3 validation nodes) We could have an algorithm that increase the number of validation nodes in order to reduce the number of replication nodes by shard, to have a limit of 500 Mb/s needed for the transaction. Having 6 validation nodes for a full number of replication nodes, represent 33 nodes by shard so 475 Mb/s needed with the previous calculation

samuelmanzanera commented 1 year ago

Interesting subject and solutions proposed :+1:

But what happens if the number of nodes to replicate 3 MB doesn't match the required constraints for the nb of replicas, I guess we should take slower nodes. But, this would have an impact on the latency.

In that case, the solution proposed in the “Additional context” could be leveraged to reduce the shard size over the validation nodes partitioning.