diba-io / carbonado

An apocalypse-resistant data storage format for the truly paranoid.
MIT License
105 stars 7 forks source link

Headers #6

Closed cryptoquick closed 1 year ago

cryptoquick commented 1 year ago

Each of the four formatting steps should be configurable as to whether they can be used. They should also be built in as conditionally compiled features.

These options should be tracked, perhaps in a compiletime-generated 4-bit bitmask. This can then be added to the magic number, and also the bech32m filename.

This should also make #5 easier to debug.

cryptoquick commented 1 year ago

We could use a bitmask matrix to indicate whether a storage format is used to encode something. This would allow skipping certain steps if desired. Then, formats can be referred to by their bitmask. Carbonado 0 means no compression, encryption, stream verification, or error correction. Carbonado 15 would be all of them. If we used a byte in a magic number header, that'd also leave room for future formats. Using a varint instead would futureproof this even more.

cryptoquick commented 1 year ago

Bitmask for Carbonado formats c0-c15

Format Encryption Compression Verifiability Error correction Use-cases
c0 Marks a file as scanned by Carbonado
c1 :white_check_mark: Encrypted incompressible throwaway append-only data streams such as CCTV footage
c2 :white_check_mark: Rotating public logs
c3 :white_check_mark: :white_check_mark: Private archives
c4 :white_check_mark: Unencrypted incompressible data such as NFT/UDA image assets
c5 :white_check_mark: :white_check_mark: Private media backups
c6 :white_check_mark: :white_check_mark: Compiled binaries
c7 :white_check_mark: :white_check_mark: :white_check_mark: Full drive backups
c8 :white_check_mark: ???
c9 :white_check_mark: :white_check_mark: ???
c10 :white_check_mark: :white_check_mark: ???
c11 :white_check_mark: :white_check_mark: :white_check_mark: Encrypted device-local Catalogs
c12 :white_check_mark: :white_check_mark: Publicly-available archival media
c13 :white_check_mark: :white_check_mark: :white_check_mark: Georedundant private media backups
c14 :white_check_mark: :white_check_mark: :white_check_mark: Source code, token genesis
c15 :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: Contract data

Verifiability is needed to pay others for storing or hosting your files, but it inhibits use-cases for mutable or append-only data other than snapshots, since the hash will change so frequently. Bao encoding does not have a large overhead, about 5% at most.

Any data that is verifiable but also unencrypted is instead signed by the local key. This is good for signed compiled binaries or hosted webpages.

Estimated encoding overhead

Encoding Cost Details
Encryption ~200 bytes AES-GCM authenticated encryption
Compression Variable -80% for contracts, -20% for code, +~100 bytes if incompressible
Verifiability ~5% Bao encoding
Error correction 200% 4/8 ZFEC encoding

All formats have a magic number and Carbonado header that includes necessary information for its specific formats.