diba-io / carbonado

An apocalypse-resistant data storage format for the truly paranoid.
MIT License
105 stars 7 forks source link

Swap zfec and bao #2

Closed cryptoquick closed 1 year ago

cryptoquick commented 1 year ago

Originally, the bao step was before zfec in encoding, because there was a concern that storage providers might want to regularly scrub their data. The problem is, storage providers are incentivized to use a modified version of this library to strip the zfec encoding. If they're more responsible, they'd be using a filesystem with btrfs instead, and the volume would be configured to scrub regularly. Sure, Carbonado could try to run a scrub in btrfs and disable zfec encoding, but that excludes ZFS users, and let's say we include ZFS also, we'd still be relying on more responsible storage providers. When given the option between more paranoia and another form of optimization, more paranoia is preferred, within reason. Some might see encoding all data in such a way that doubles its size to be unreasonable, and for that maybe we could develop a "Carbonado Lite", if there's a demand for it, but that should be a separate format.

Regardless, by swapping the FEC and bao encoding, we could also swap the integrity check and perform it one chunk at a time, so that a map of errors can be produced, and the proper chunks omitted. This is smarter than the computationally-intensive combinations algorithm at present. In addition, the perverse incentive from before is mitigated by being able to perform stream verification beforehand. This also requires CSPRNG random padding bytes, and also, for the amount of padding to never be revealed (in the clear, at least) to storage providers.