ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.45k stars 338 forks source link

Retrieval Redundancy Level not set (defaults to PARANOID) #4694

Closed ldeffenb closed 1 week ago

ldeffenb commented 4 months ago

Context

2.1.0-rc2 (and probably 2.0.* as well)

Summary

See #4693 for details, but tracking that down found that SetLevelInContext is never invoked by the /bytes api therefore when the joiner invokes GetLevelFromContext, it gets the default value of 4 or PARANOID.

Expected behavior

If the Swarm-Redundancy-Strategy is specified on /bytes, then I would expect it to be honored by the joiner.

Actual behavior

My hacked logs (see here ) prove that the joiner is using rLevel = 4.

Steps to reproduce

Add your own logs, retrieve something with the /bytes API specifying Swarm-Redundancy-Strategy and verify that it doesn't get to the joiner. Or just grep the source code for SetLevelInContext and GetLevelInContext. The file pipeline invokes the former, but that isn't invoked for the root chunk.

Possible solution

Have the API invoke SetLevelInContext before creating the joiner like maybe as in this hack that only handles NONE: https://github.com/ldeffenb/bee/blob/ed323f9561f1ae541895153383fcb6edc1d142fc/pkg/api/bzz.go#L546

ldeffenb commented 4 months ago

In actual fact, it appears that the API doesn't have a way to specify the retrieval redundancy level, but only the strategy for prefetching and/or changing that strategy to a fallback.

This is decidedly bad IMHO.

ldeffenb commented 4 months ago

I added this workaround: https://github.com/ldeffenb/bee/blob/ed323f9561f1ae541895153383fcb6edc1d142fc/pkg/api/bzz.go#L546

ldeffenb commented 4 months ago

Do you really want this to default to PARANOID? It will cause LOTS of canceled retrievals when accessing /bzz files from a browser. Just hit the following URL on a sepolia testnet node and watch the retrieval failed debug logs. http://localhost:1633/bzz/fdfd170f73953bc262d936d3a5329b787980335dc0547032bb2a6239ebe95a76/14/coverage.png

zelig commented 3 weeks ago

Do you really want this to default to PARANOID? It will cause LOTS of canceled retrievals when accessing /bzz files from a browser. Just hit the following URL on a sepolia testnet node and watch the retrieval failed debug logs. http://localhost:1633/bzz/fdfd170f73953bc262d936d3a5329b787980335dc0547032bb2a6239ebe95a76/14/coverage.png

Yes, this is kindof deliberate. The idea is that by default (when we do not for sure know the level of encoding, then we really try. But it should surely be overwritten by specifying the "SWARM-REDUNDANCY-LEVEL" header to the appropriate expectation including the case that "SWARM-REDUNDANCY-LEVEL=0" should supress the default level 4. If it turns out that too often replica requests are being fired even though the original is found and therefore there are two many cancellations, we could introduce a new header called SWARM-REPLICA-FALLBACK-INTERVAL which would wait before replicas belonging to a higher level are requested. Or maybe even better strategies similar to the ones for erasure codes could be used here: