Open ldeffenb opened 1 month ago
Here is the /status output for each of the radius 4 nodes/neighborhoods:
"peer": "480...",
"proximity": 0,
"beeMode": "full",
"reserveSize": 4193860,
"reserveSizeWithinRadius": 3426466,
"pullsyncRate": 0,
"storageRadius": 4,
"connectedPeers": 41,
"neighborhoodSize": 0,
"batchCommitment": 2757492736,
"isReachable": true
"peer": "b80...",
"proximity": 0,
"beeMode": "full",
"reserveSize": 4193515,
"reserveSizeWithinRadius": 3430038,
"pullsyncRate": 0,
"storageRadius": 4,
"connectedPeers": 41,
"neighborhoodSize": 0,
"batchCommitment": 2757492736,
"isReachable": true
"peer": "c80...",
"proximity": 0,
"beeMode": "full",
"reserveSize": 4176960,
"reserveSizeWithinRadius": 3810215,
"pullsyncRate": 0,
"storageRadius": 4,
"connectedPeers": 41,
"neighborhoodSize": 0,
"batchCommitment": 2757492736,
"isReachable": true
"peer": "d00...",
"proximity": 0,
"beeMode": "full",
"reserveSize": 4093335,
"reserveSizeWithinRadius": 4043948,
"pullsyncRate": 0,
"storageRadius": 4,
"connectedPeers": 41,
"neighborhoodSize": 2,
"batchCommitment": 2757492736,
"isReachable": true
"peer": "def...
"proximity": 0,
"beeMode": "full",
"reserveSize": 4159288,
"reserveSizeWithinRadius": 4043952,
"pullsyncRate": 0,
"storageRadius": 4,
"connectedPeers": 37,
"neighborhoodSize": 2,
"batchCommitment": 2757492736,
"isReachable": true
"peer": "e80...",
"proximity": 0,
"beeMode": "full",
"reserveSize": 4181723,
"reserveSizeWithinRadius": 3733244,
"pullsyncRate": 0,
"storageRadius": 4,
"connectedPeers": 41,
"neighborhoodSize": 0,
"batchCommitment": 2757492736,
"isReachable": true
If you compare those reserveSizeWithinRadius
to the radius 5 nodes in the attached /status/peers file, you'll notice that the radius 4 have almost full reserves while the radius 5 nodes are only about 1/2 full; consistent with a recent increase of radius that didn't land uniformly across the swarm.
If this can happen in testnet and sustain for several days (as it has, until I noticed), then it can certainly happen in mainnet and be missed across the 1,024, 2,048, or even 4,096 neighborhoods.
Interestingly, salud allows peers to be one less than the network radius (scroll right to see the -1): https://github.com/ethersphere/bee/blob/97e7ee699be3b4325a233b1ca2dc177cd88f17e1/pkg/salud/salud.go#L203 But requires itself to be equal to the network radius: https://github.com/ethersphere/bee/blob/97e7ee699be3b4325a233b1ca2dc177cd88f17e1/pkg/salud/salud.go#L225
Context
v2.1.0 (and earlier)
Summary
Several of my sepolia testnet nodes are not participating in the storage compensation rounds. All of these nodes have storage radius 4 while the remainder of the swarm has increased to radius 5. Radius 4 is CORRECT for these lesser-populated neighborhoods.
The nodes are logging:
and
Expected behavior
If a node has the same radius as its neighborhood peers, then it must be healthy, regardless of what the radius is in other neighborhoods.
Actual behavior
Because other neighborhoods in the swarm have increased to radius 5, the lesser-populated neighborhoods are not participating in the storage compensation.
Steps to reproduce
Just fire up a node in one of the lesser-populated, radius 4 sepolia testnet neighborhoods. Specifically (at this point in time): 0x480, 0xb80, 0xc80, 0xd00, 0xdef, 0xe80
Possible solution
Use a neighborhood radius calculation for health rather than the overall swarm which may be different.
Here's the /status/peers output of one of the affected nodes. 4635-status-peers.txt