ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.
https://www.ethswarm.org
BSD 3-Clause "New" or "Revised" License
1.45k stars 338 forks source link

Add option to skip traversal with stewardship endpoint #3246

Open agazso opened 2 years ago

agazso commented 2 years ago

Summary

It would be good if the /stewardship GET endpoint could have an optional parameter so that the traversal of the data is skipped, therefore it would be possible to check only a single chunk availability.

More context in #3205

Motivation

I wanted to write a tool that can check if the individual chunks of a dataset are available on the network and wanted to use the /stewardship GET endpoint for that. However it turned out that it has additional logic in it and it recognizes root chunks and immediate chunks or manifest root chunks, and then traverses all the chunks that belong to the data set. That way the checks can become very expensive and requires additional logic on the user's side to differentiate between different chunks.

Implementation

There could be an optional query parameter (e.g. traverse=false or skipTraversal or something like that) when specified then would skip the traversal logic and would just simply try to fetch the given chunk from the network.

I created an example implementation that does this in the https://github.com/ethersphere/bee/tree/feat/stewardship-skip-traversal branch, but I understand that it is not production quality, so I don't expect it to be merged.

ldeffenb commented 2 years ago

Actually, there are 3 different use cases for the /stewardship API, both GET and PUT.

1) Current operation which traverses an entire manifest if the reference "smells" like one, and also traverses all of the chunks of a non-manifest /bytes reference (BMT joiner). Really only useful for small manifests or files. 2) An option that only does the full /bytes reference (BMT joiner), but does NOT traverse the manifest. Useful for clients that do their own explicit mantaray manifest processing. (https://github.com/ethersphere/mantaray-js) 3) The option described above which only checks the exact specified chunk. Useful for clients that do their own BMT processing (https://github.com/fairDataSociety/bmt-js)

Both myself and @mfw78 are doing 2 with our large manifests on the swarm.

istae commented 1 year ago

Since it is actually possible to retrieve single chunks using the /stewardship endpoint, we will for now close this issue.

ldeffenb commented 1 year ago

I disagree. If you hit the /stewardship endpoint with a chunk address that happens to be the root reference of a mantaray manifest, it will traverse and process the ENTIRE manifest. Unless I'm missing something in the API that constrains it to a single chunk?

istae commented 1 year ago

I see the point now. No, we do not have a query for this yet. It should be trivial to add though.