ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.44k stars 338 forks source link

reduce number of connections #4695

Open istae opened 1 month ago

istae commented 1 month ago

currently on mainnet, average number of connections is around 200 connections per node, and this is mainly due to the high storage radius and high number of nodes (relative to the radius) so the max connections per bin should be adjusted.

We can achieve this by reducing the over saturation peers count from 20 to 16 in kademlia.

janos commented 1 month ago

May I ask what is the reason to reduce the number of connections? Number of connections influences the network topology with the consequence of changing the number of hops required for a chunk to reach the desired node. It would be good to measure what is the benefit in respect of the download and upload performance (speed and resource consumption) before settling on the saturation peers count. I would even say that a dynamic saturation peers count is a good thing to have based on the network conditions, but that would be a bit larger feature to add.

istae commented 1 month ago

That reminds me of an old branch from 3 years ago :) https://github.com/ethersphere/bee/pull/2530 The simple reason is to reduce the connection count or limit it without hopefully not hurting performance.

I believe we can achieve better performance by having higher peer count for shallower bins and fewer peers for deeper bins. This may seem counterintuitive but on average half of the chunk requests will go to bin 0 because relative to your address, half of the network falls in your bin 0.

So instead of having a constant 16 suffix addresses (4 bits) to balance the bins, image that we have 64 balanced addresses for bin 0, then 32 for bin 1, 16 for bin 3, 8 for the rest of bins, request hops could be drastically reduced, and the total sum of connection count will lower than 200.

janos commented 1 month ago

That is a very interesting approach. I am not sure why https://github.com/ethersphere/bee/pull/2530 is abandoned, as it has potential. I believe that research team should validate it before making changes to the topology. There is also a possibility that by reducing the number of peers in bins can result with the same or even higher syncing activity as the number of hops is increased. But, any assumptions should be validated by measuring changes in syncing and retrieving time and system resources consumption.

ldeffenb commented 1 month ago

May I ask what is the reason to reduce the number of connections?

I believe the origin of this request is due to some home routers becoming completely saturated and leaving the local network unusable when multiple bee nodes are run.

janos commented 1 month ago

I believe the origin of this request is due to some home routers becoming completely saturated and leaving the local network unusable when multiple bee nodes are run.

A configurable maximal number of connected peers would be good to have for such situations for users to fine tune based on resources. Kademlia already has SaturationPeers in its Options struct it just is not configurable using cli flags or configuration files.

I still believe that reducing the number of peers is not the best solution, but to address the issue in the syncing protocols.

zelig commented 1 month ago

This should definitely be a SWIP first.

Actually, the peercount in the PO bins ought to be chosen to reflect 1) the connectivity restrictions of a node and 2) the throughput requierments. 1) should be calibrated or bounded by a configurable constant non-negative integer (yes, in fact setting it to 0 should create a non-connected local client). 2) should be informed by the distribution of chunk push and retrieve requests.

(2) is working the following way. Let's consider a random sample of N swarm chunks. When you take the chunks PO with respect to a particular address the POs follow a reverse expobential scale. In particular, in PO i, you will have 2^{1-i}N chunks. Therefore when a node sends requests for these N chunks the number requests that should be routed to a peer in PO bin i is 2^{1-i}N.
The chunks falling into a bin are uniformly distributed so as long as the peers in the bin are balanced each gets the same amount of requests.

Our ultimate goal is to handle requests most efficiently, i.e., to potentially max out the throughput of each peer in the event of a lot of requests. With the naive assumption of the throughput of each peer connection being constant (or at least a distribution independent of the peer). the best strategy to max out throughput is to have a uniform distribution of requests over peers.

You need to connect to each node within the neighbourhood designated by the storage depth D, i.e., cca S2^{-D} If the max of peers is M, you got `R=M-S2^{-D}connections to allocate to the first D PO bins. So if you operate a swarm node with a local API calls and do not participate in to guarantee uniform distribution, then you need2^{1-i}*R` peers in PO bin i. The same applies if the node is serving requests to light clients with 1 connection

Interestingly, if you are a node operator and not using the API, ie. only do forwarding for full nodes with saturated kademlia table, then the distribution of requests is constant across bins so you need to put R/(D-1) peers in each of the bins 1, ..., d-1

So,

now go figure

zelig commented 1 month ago

The ultimate solution though should probably be directly driven by throughput. If some peers max out, then we open a connection to a peer that is a PO deepest new sister to that node