ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.44k stars 338 forks source link

"Shallow" receipt that isn't shallow #4690

Open ldeffenb opened 1 month ago

ldeffenb commented 1 month ago

Context

Bee 2.1.0-rc2 and earlier

Summary

The sepolia testnet is in a state where the radius of many neighborhoods split from storage radius 4 to 5. There are some (5 out of the original 16 to be exact) that have not yet filled their reserve to a sufficient level to split. Some of these may not actually split by the time my OSM tile loading completes.

The problem is that the pusher node's neighborhood DID increase to storage radius/depth 5. So now, all chunks being pushed into the neighborhoods that have NOT yet split are logging pusher: shallow receipt depth 4, want at least 5 and needlessly retrying the push of those chunks. This is causing extra swarm traffic and actually triggering extra swap compensation cheques due to the superfluous retries.

Expected behavior

The chunks ARE arriving at their desired destination neighborhood and depth, logs and retries should not be happening.

Actual behavior

Logs, retries, and generally unnecessary traffic into the swarm.

Steps to reproduce

Set up a swarm where some neighborhoods are fuller than others and push data into that swarm until the fuller neighborhoods split but the less full neighborhoods have still not filled their reserve. And push from a neighborhood that has a filled reserve and has already increased the storage radius.

Possible solution

Somehow the pusher needs to be aware of the actual storage radius/depth in the target neighborhood and remove the assumption that all neighborhoods are at the same storage radius/depth as the pushing node.

ldeffenb commented 1 month ago

Consider this set of prometheus graphs image I started enough new nodes to cover the missing neighborhoods, but the shallow receipts still keep happening. But the failed send attempts have been removed. I suspect that the retries are finally routing through one of the other depth 4 neighborhoods who then say that "it worked" without thinking it is shallow because the routing peer is still at 4 as is the target.

ldeffenb commented 1 month ago

This issue will happen even in the mainnet and may last a LONG time if the new data rate is slower than my OSM push. One neighborhood may transition and I could see it taking days or even weeks for other neighborhoods to transition as well. During that rolling change of the swarm's storage radius/depth, there will be lots and lots of unnecessary push retries depending on the depth of the pushing nodes.