ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.44k stars 338 forks source link

Strange pullsync/reserve growth #4634

Open ldeffenb opened 3 months ago

ldeffenb commented 3 months ago

Context

2.0.0

Summary

Starting a new node into the sepolia testnet. The reserve seems to grow on a non-linear curve. I would expect it to climb directly up to the current swarm reserve level and then flatten, not asymptotically approach that level.

image

Expected behavior

Expected a steady climb at a constant pull rate until all reserve has been filled.

Actual behavior

See the graph above, it doesn't make sense to me.

Steps to reproduce

The sepolia testnet is in a unique position right now to see this. There is no data going into the swarm, and the current radius is still zero so every single node in the swarm should have the full contents. Completely pullsyncing from any single node should completely fill the reserve of the new node. This just doesn't seem to be happening.

Any issue with pullsync like that being seen in a radius zero swarm could explain the mismatched reserve contents being indicated by the differing hashes being revealed in the mainnet Schelling game.

Possible solution

Don't have one yet, sorry. But fire up a brand new sepolia testnet node and watch your own reserve metrics to see if you see the same thing I'm seeing.

ldeffenb commented 3 months ago

It also seems kind of pointless for a node that is freshly pullsyncing to be offering all of the "new" chunks to the other peers, only to have them (as expected) not wanted since they already have them. The yellow lines are the new node. image

I took out the new node so you can see the degradation in the offers coming from the peers. Is this somehow possibly due to the new pullsync throttle that I seem to remember being mentioned? If so, I would have still expected a lowered flat rate, not an ongoing reduction in offers. image

ldeffenb commented 3 months ago

And no, the host is not CPU, Disk, Memory, nor network bandwidth bound.

ldeffenb commented 3 months ago

Every node in the testnet swarm should have all of the chunks, but will the new node ever get the final few? Film at 11.
image

ldeffenb commented 3 months ago

It did finally finish and get up to the swarm's reserve level, but that last 15 minutes was agonizing to watch. image

ldeffenb commented 3 months ago

And just to complete the picture, here's the final reserve growth graph. I'm working to fire up another new node and may enhance the logging in the pullsync area to maybe get a better idea of what is happening. image Is there possibly an off-by-one error in there somewhere? Or something that can cause individual chunks to be skipped over or not offered? I know every node may have the chunks in different orders and/or in different bins, but something is definitely not working as I would expect.

ldeffenb commented 3 months ago

I just discovered the bee_puller_synced_chunks of historical vs live, so here are those graphs: image And zooming in on the final 15 minutes shows that they were historical pulls that delivered the final needed chunks. image

ldeffenb commented 3 months ago

I may understand a bit about the nature of the reserve growth. Bins are pulled in parallel from a single peer, and all peers are started at the same time. Higher numbered bins have fewer chunks in them, so they finish first. Lower numbered bins have more chunks (fewer bits to match), so they take longer to finish. As bin/peer pullers finish, the pull rate reduces as the parallelism tapers off and the number of chunks arriving into the reserve also is reduced.

But that understanding doesn't explain the long delay in getting those final few chunks. Hopefully my increased logging in the pullsync/puller area will shed some light on it when this peer (a nuked copy of the one that is described above) finishes it's pull. Currently at 800K out of a 1.3M target and beginning to taper off in the growth rate.