ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.45k stars 338 forks source link

Pullsync ignores stamp timestamp changes #4704

Closed ldeffenb closed 2 months ago

ldeffenb commented 4 months ago

Context

Bee 2.1.0 (and earlier)

Summary

Related to #4703 Pullsync only pulls chunks that are not already in the reserve based solely on address and batch. If content was re-uploaded, the chunk/batch/stamp timestamp is updated and the newly stamped chunk is pushed. Even after #4703 is fixed, pullsync will still not propagate the current timestamp for the chunk/stamp.

Expected behavior

Stamp timestamps should match across all nodes and reserves.

Actual behavior

Given that pullsync only requests chunks that are not already in the reserve based on address and batch, updated timestamps will not be pulled and therefore will not be updated throughout the storing neighborhood. Only those nodes that were directly pushed (presuming #4703 is fixed) will have the current stamp timestamp.

https://github.com/ethersphere/bee/blob/8c61408db7cd63227cfe161967e3b70f90b9fde3/pkg/pullsync/pullsync.go#L185-L192

Steps to reproduce

I'll leave this to your imagination.

Possible solution

The pullsync protocol needs to be updated to communicate not only the batch, but the entire stamp so that the pulling node can determine if the local stamp timestamp is current and/or update to the latest timestamp.

ldeffenb commented 4 months ago

It's worse than just the timestamp. Pullsync will also not pull a re-stamped chunk that happens to have been re-stamped into a different INDEX of the batch's bucket. This is a critical flaw in mutable stamps.

And worse, since many chunks exist in the swarm whose stamps were forgotten in the great batch purge, those chunks CANNOT be reliably re-stamped with the original stamp because they may end up at different indices which will not be properly propagated to nodes still holding the original stamped chunks.

zelig commented 3 months ago

@ldeffenb you are spot on, i indicated this issue in this writeup https://hackmd.io/@zelig/SJVY-ACKa it also explain how it can lead to reserve mismatches to this day since the 'great batch purge'. instead of the batchID indeed the pullsync offers should use the hash of the serialised stamp. Nodes should store this hash inlocalstore alongside the stamp and can be used to detect restampings on differring index.

istae commented 2 months ago

https://github.com/ethersphere/bee/pull/4717