Open ldeffenb opened 1 month ago
Consider this set of prometheus graphs I started enough new nodes to cover the missing neighborhoods, but the shallow receipts still keep happening. But the failed send attempts have been removed. I suspect that the retries are finally routing through one of the other depth 4 neighborhoods who then say that "it worked" without thinking it is shallow because the routing peer is still at 4 as is the target.
This issue will happen even in the mainnet and may last a LONG time if the new data rate is slower than my OSM push. One neighborhood may transition and I could see it taking days or even weeks for other neighborhoods to transition as well. During that rolling change of the swarm's storage radius/depth, there will be lots and lots of unnecessary push retries depending on the depth of the pushing nodes.
Context
Bee 2.1.0-rc2 and earlier
Summary
The sepolia testnet is in a state where the radius of many neighborhoods split from storage radius 4 to 5. There are some (5 out of the original 16 to be exact) that have not yet filled their reserve to a sufficient level to split. Some of these may not actually split by the time my OSM tile loading completes.
The problem is that the pusher node's neighborhood DID increase to storage radius/depth 5. So now, all chunks being pushed into the neighborhoods that have NOT yet split are logging
pusher: shallow receipt depth 4, want at least 5
and needlessly retrying the push of those chunks. This is causing extra swarm traffic and actually triggering extra swap compensation cheques due to the superfluous retries.Expected behavior
The chunks ARE arriving at their desired destination neighborhood and depth, logs and retries should not be happening.
Actual behavior
Logs, retries, and generally unnecessary traffic into the swarm.
Steps to reproduce
Set up a swarm where some neighborhoods are fuller than others and push data into that swarm until the fuller neighborhoods split but the less full neighborhoods have still not filled their reserve. And push from a neighborhood that has a filled reserve and has already increased the storage radius.
Possible solution
Somehow the pusher needs to be aware of the actual storage radius/depth in the target neighborhood and remove the assumption that all neighborhoods are at the same storage radius/depth as the pushing node.