ethersphere / swarm

Swarm: Censorship resistant storage and communication infrastructure for a truly sovereign digital society
https://swarm.ethereum.org/
GNU Lesser General Public License v3.0
489 stars 112 forks source link

repair globally pinned content without manifest #2193

Closed mortelli closed 4 years ago

mortelli commented 4 years ago

The tests run during phase 3 of Global Pinning have that the current implementation is not viable, as it depends on manifests, which are chunks that are garbage collected in non-pinner nodes.

This means that when requesting a chunk that cannot be fetched, if the manifest is not present the publisher will fail to be extracted, making the whole content repair process fail.

For now we will leave the issue of garbage-collected manifest chunk repair for later theoretical discussion and resolution, and implement publisher values through query params instead of extracting publishers from manifests.

Puts #2182 on hold.

mortelli commented 4 years ago

Since the solution to this problem will replace the publisher manifest code, I have made a copy of the current GP implementation branch (named global-pinning-manifest-publisher) so that we can get back to it eventually if we need to.

zelig commented 4 years ago

The 'publisher-in-manifest' approach is indeed not compatible with the indiscriminate garbage collection approach in testing. However, I think it is a typical usecase that a file is accessed through a manifest which is much more recent than the file (which eventually needs global pinning to recover). Practically this means that the manifest access is done through either a feed or ENS/RNS e.g., myblog.eth/some/old/entry/which/has/been/gc-ed Therefore I think we should keep this mode of getting the publisher.

Surely as you are right it is not gonna work with the testing approach I suggest we update the user story acceptance criteria with the explicit assumption that the manifest through which the file is accessed is available at the time the recovery is tested. To make the test respect this assumption we create the manifest after we triggered the GC with mass upload.

As for alternative approaches:

mortelli commented 4 years ago

The 'publisher-in-manifest' approach is indeed not compatible with the indiscriminate garbage collection approach in testing. However, I think it is a typical usecase that a file is accessed through a manifest which is much more recent than the file (which eventually needs global pinning to recover). Practically this means that the manifest access is done through either a feed or ENS/RNS e.g., myblog.eth/some/old/entry/which/has/been/gc-ed Therefore I think we should keep this mode of getting the publisher.

Surely as you are right it is not gonna work with the testing approach I suggest we update the user story acceptance criteria with the explicit assumption that the manifest through which the file is accessed is available at the time the recovery is tested. To make the test respect this assumption we create the manifest after we triggered the GC with mass upload.

As for alternative approaches:

* Instead of global CLI option for publisher it should be a query parameter or HEADER value of the download API which is compatible with any offband communication of publisher or permanent links.

We've decided to go with the query parameter option for now.

The manifest implementation can be revisited later, probably on bee.

mortelli commented 4 years ago

completed through #2202