Self-repairing of erasure coded uploads by reconstructing and reuploading lost chunks periodically

ethersphere / bee

Bee is a Swarm client implemented in Go. It’s the basic building block for the Swarm network: a private; decentralized; and self-sustaining network for permissionless publishing and access to your (application) data.

https://www.ethswarm.org

BSD 3-Clause "New" or "Revised" License

1.45k stars 338 forks source link

Self-repairing of erasure coded uploads by reconstructing and reuploading lost chunks periodically #4607

Open NoahMaizels opened 6 months ago

NoahMaizels commented 6 months ago

Summary

Achieve self repairing uploads by checking for lost chunks and regenerating original data chunks / parity chunks for reupload for erasure coded uploads.

If some chunks from an erasure encoded upload becomes irretrievable for some specified amount of time, but there are still enough chunks to reconstruct the original data, then we should be able to reconstruct the data and parity chunks and re-upload any missing data or parity chunks in order to have self repairing uploads.

Potential additional related feature: If an unhealthy neighborhood is identified, the missing chunk could have a nonce get iterated on it when re-uploaded which updates its address and lands it into a different healthy neighborhood.

Motivation

This means lower levels of erasure coding should be able to provide relatively higher levels of data protections than without this feature, since as long as there is no significant loss of chunks in a short time, the missing chunks should be able to be re-uploaded.

Implementation

The period for which the checks for lost chunks happen perhaps should be adjustable, as well as the time limit after which a chunk is considered "lost", if there is a possibility it might be temporarily irretrievable.

istae commented 6 months ago

it would not be periodically, the self repair would occur when the file is downloaded by the user

ldeffenb commented 6 months ago

Where are you getting the stamp to push the repaired/replaced chunks back into the swarm if it is downloaded on a non-uploading node?

tmm360 commented 6 months ago

the repair operation could be done also manually, specifying the batch to use.

ldeffenb commented 6 months ago

the repair operation could be done also manually, specifying the batch to use.

Possibly tied in to -X PUT /stewardship which already provides for an explicit batch as well as a batch lookup if executed on the original uploading node?

tmm360 commented 6 months ago

Yes, I think that -X PUT /stewardship could be a good interpretation of the intention to repair a content on network. At this point, the command could be executed by everyone, also not having the full content pinned locally, but trying anyway to recover missing chunks. It could try to retrieve locally missing chunks, and when it have enough, can start to re-upload everything again, included also recovery chunks.

istae commented 5 months ago

The ideas is that when original uploader fetches the content, if some of the data chunks were recovered by the erasure coding, the missing chunks would be reuploaded with the original stamp.

The stewardship endpoint is one way, for sure, but the repair mechanism outlined in the task is seamless and requires no extra steps. The repair occurs naturally as part of downloading the data.

ldeffenb commented 5 months ago

Call me dense, but if the chunk was missing, and was re-created by the repair mechanism, where does the original stamp come from? My understanding is a stamp is a batch (presuming all chunks of a reference are for the same batch is sometimes true, but will a retriever have ANY stamps?, they're only on the uploading node and in various reserves, the retrieval protocol doesn't pass stamps currently), a bin (can be calculated from the re-generated chunk reference with the current stamper), and an index within that bin (this is the one that is only known by the original uploader and any reserve, but since the chunk wasn't retrieved, then it must not have been in any reserve, and it's definitely not likely that the retrieving/reconstructing node will be the original uploading node).

ldeffenb commented 5 months ago

Ah, I just re-read your post and had missed the "original uploader fetches..." part. Ok, the original uploader should have the original stamp for a re-generated chunk and can therefore re-push that chunk into the swarm with the original stamp. But if the original uploader has the content pinned, then retrievals would never fail (unless they're forced to come from the swarm like GET /stewardship) so there would never be any recovery necessary.

NoahMaizels commented 5 months ago

I don't think they should need it pinned, right? I think they would just need the stamp, as long as the content is still retrievable and enough data+parity chunks remain that the original data is reconstructable, they can retrieve it, reconstruct the original content, use the completed original data to reconstruct any missing parity chunks, and then re-upload missing parity or data chunks.

One thought I had though is it would be nice if there is a way for them to check if any of the chunks are no longer retrievable without needing to actually download all the chunks, like if there were some way to send a request for some sort of cryptographic proof to each neighborhood for every chunk and then get as response a proof that the chunk exists in that neighborhood. I guess maybe some kind of zero knowledge proof?

zelig commented 3 months ago

One thought I had though is it would be nice if there is a way for them to check if any of the chunks are no longer retrievable without needing to actually download all the chunks, like if there were some way to send a request for some sort of cryptographic proof to each neighborhood for every chunk and then get as response a proof that the chunk exists in that neighborhood. I guess maybe some kind of zero knowledge proof?

i long been wanting to have this feature. great for data availability sampling too. but you dont need ZK for this. you can just return an inclusion proof. dont forget this is why we use BMT hash for content address

zelig commented 3 months ago

This feature should not be part of a normal download even if it is the uploader. It is just too much assumption. not natural. either through stewardship or with an explicit header