BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.13k stars 350 forks source link

Use of incremental/chunk-based verification #82

Open 17dec opened 4 years ago

17dec commented 4 years ago

BLAKE3, by virtue of being a tree-based hash, supports incremental verification of data. But I don't see a way to actually make use this feature with the current API.

I'm thinking of using BLAKE3 for a BitTorrent-like data exchange platform where metadata (including hashes) is exchanged in advance while the file contents can be exchanged in no particular order. To support this I'd need to be able to:

  1. Extract intermediate tree nodes at a particular level (e.g. hashes for all 128KiB chunks).
  2. Verify the list of intermediate tree nodes against the root hash.
  3. Verify chunks of data against the intermediate tree nodes.

What would be the best way to approach this? Is an API planned that exposes the tree-based features of BLAKE3?

oconnor663 commented 4 years ago

The short answer is, take a look at Bao: https://github.com/oconnor663/bao

Bao implements the BitTorrent-like aspects of BLAKE3. You start with the regular BLAKE3 hash of an entire file, and then you can stream that file incrementally, through an encoding format that interleaves subtree hashes and chunks of content bytes. The Bao API also defines a "slices" concept (sub-sections of the encoding, extracted so that they can be verified in isolation), which might be particularly useful for BitTorrent-style applications.

The relationship between the bao crate and the blake3 crate is currently a little bit hacky. Like you mentioned, Bao needs to get its hands on interior subtree hashes ("subtree chaining values"), which the public blake3 API doesn't provide. To solve this without duplicating too much code, blake3 exposes a public-but-undocumented guts module (https://github.com/BLAKE3-team/BLAKE3/blob/master/src/guts.rs). Because it's a very minimal interface, it doesn't take advantage of the fanciest, highest-performance SIMD optimizations that the regular hash API does. At some point in the future, we'll probably want to stabilize a richer guts API that's on par in terms of performance. (The main thing is that you need to hash batches of chunks together. https://github.com/oconnor663/blake2_simd does something similar in its many APIs.) In the meantime, the performance of the guts API is about on par with SHA-2, instead of several times faster than that, depending on what SIMD extensions the CPU supports. If you're streaming bytes over the public internet, it's almost certainly fast enough.

There's another incremental/tree-ish aspect of BLAKE3 that no one's implemented yet. We could maintain the hash of a large file, and update that hash when the file is edited in place. The Bao encoding format is a decent fit for this, and we could probably add such a feature to the bao crate without needing to rearchitect much. I'm not a filesystems guy, but I suspect a lot of filesystems do something similar to this internally. (Whether they could be simplified by basing themselves on BLAKE3, I don't know enough to say. I suspect filesystems need to do a lot of explicit block management anyway, so an abstraction that hides away the block/chunk concept might not be very helpful.) Other than that, I don't know of any very compelling use cases.

cesarb commented 4 years ago

but I suspect a lot of filesystems do something similar to this internally

Most filesystems don't even care about the hash of the data, only its location on the disk. Those that do care about the hash (as an additional integrity check, the main identifier is still the location on the disk) store the hash of each block/extent/whatever separately in the metadata, and each hash is fully recomputed whenever the corresponding block/extent/whatever changes (there is no incremental update). There is no tree structure of the hashes, only of the locations on the disk (and sometimes not even that; FAT32 and its relatives use a linked list).

17dec commented 4 years ago

@oconnor663 Thanks for the quick response. The guts.rs API looks like it will do what I need. I see that there's fewer parameters than BLAKE2's tree mode, so that will certainly simplify tree construction. The lack of SIMD optimizations is unfortunate but not a deal breaker at the moment. Since I won't be using the 1024-byte chunks directly (other than to construct parent nodes at some level), I'm sure the optimizations can be implemented at some point. Parallelism is still possible, at least.

I have enough to experiment with for the forseeable future. :)

oconnor663 commented 4 years ago

I was wondering whether you might be able use the Bao API directly. Your metadata could store a regular BLAKE3 hash of the target file, and then your client could use that hash to verify arbitrary slices. You'd probably use the "outboard" encoding format, so that senders have direct access to the raw original file. (Constructing the outboard tree is fast enough that you could consider building it on-demand in temporary storage, to reduce the chance that it could ever be corrupted.) It's possible that unless you want to do something very complicated, like efficient recovery of a partially corrupted file, or storing an incomplete outboard tree on the sender, you'd never need to lift the hood to deal with subtree hashes explicitly?

17dec commented 4 years ago

Eh, I was too used to constructing my own trees (using TTH or BLAKE2 before) to even consider alternatives. I was also under the impression that Bao only supported streaming from the start and interleaving hashes with data, but I see it's far more flexible than that. Interesting.

So, after a better look:

On the other hand, using Bao to remove the need to distribute intermediate hash nodes could certainly simplify things, and having 1024 byte granularity on data verification does have some advantages, despite the overhead. I may end up constructing my own trees anyway, but I'll let this idea sink in for a bit.

cesarb commented 4 years ago

I would recommend against chunk-level (1024-byte) granularity. Having 16-chunk (16384-byte) granularity instead not only divides the space used by the metadata by 16, but also makes sure you can always use the most accelerated SIMD path, even with large 512-bit vectors like AVX-512. If you want to future-proof against 1024-bit vectors, you could even go to 32-chunk (32768-byte) granularity.

Above this, it's a trade-off between the metadata overhead and how much needs to be transferred before being verified. (You might also want bigger blocks if you want to take advantage of multiple threads, for instance a 64-chunk (65536-byte) block could use 4 threads with AVX-512.)

oconnor663 commented 4 years ago

That's an excellent point. There's nothing forcing a Bao-like application to use the 1 KiB chunk size. It can use any power-of-two multiple of that, without breaking compatibility with BLAKE3. The bao crate itself doesn't expose a parameter for this, but if it looks like we'd have a use case, I'd happily add one. (Alternatively, Bao could "standardize" 16 KiB or maybe 64 KiB chunks, to avoid the configurable parameter. But on first blush that feels more restrictive than necessary.)

One way to bring the guts API up to speed would be to just have people use the regular Hasher API instead for these larger chunks. We'd need a new parameter somewhere to say "this isn't the root, don't finalize it," but otherwise all the code is reusable without any changes. That wouldn't help anyone who wanted to go full speed with 1 KiB chunks though. Still rolling it over in my head.

oconnor663 commented 4 years ago

You mention that constructing an outboard tree is fast, but it still involves hashing the entire file, doesn't it? I'm sure caching could be useful here, but it's a scary DoS vector for senders with a lot large files.

You're right, we'd need to be careful about that.

this allows senders to be dumb HTTP servers that happen to support range requests

If you wrap a std::io::Seek interface around an HTTP connection somehow, then you could use the regular bao::decode::Decoder's seeking support without needing to do any slicing on the server side. I've never implemented it myself, but it should Just Work. That said, it turns every underlying seek into a network round trip. I highlight underlying because 1 seek from the caller's perspective turns into log(N) seeks through the encoding, one for each right child on the path from the root to the leaf you're interested in. The real advantage of slicing is getting rid of these round trips, in cases where slices are small enough that the seek overhead matters.

(Alternatively, if your dumb HTTP server has a way to batch multiple reads at different offsets, we could come up with fancy ways to leverage that.)

using TTH

I've read about Tiger, and everyone who writes papers on tree hashes mentions it, but I've never seen it in the wild! Do you have code on GitHub? :-D

cesarb commented 4 years ago

We'd need a new parameter somewhere to say "this isn't the root, don't finalize it," but otherwise all the code is reusable without any changes.

I'd simply add a new finalize variant: besides finalize(&self) -> Hash and finalize_xof(&self) -> OutputReader, I'd add something like finalize_subtree(&self) -> Hash. The only question is how to prevent misuse; as the BLAKE3 paper says, setting the ROOT flag is important, but this function would make it easy to not set that flag. Perhaps doing something like what I did in my experimental Vulkan branch, which is to have an alternative LessSafeHasher and expose that particular finalize_subtree variant only on that hasher? That way, normal users of Hasher would not be tempted to call that variant, only the ones who need the incremental hashing would have to be careful.

17dec commented 4 years ago

I've read about Tiger, and everyone who writes papers on tree hashes mentions it, but I've never seen it in the wild! Do you have code on GitHub? :-D

It's used in Direct Connect - one of the early 2000's file sharing networks that is still somehow used and maintained by a bunch of people. I'm the maintainer of ncdc, hence my experience with it. We've discussed migrating to a more modern (faster + more secure) hash in recent years - BLAKE3 being an excellent contender - but Tiger has been holding up surprisingly well so far.

One way to bring the guts API up to speed

Alternatively, the following would work just as well for me:

fn hash_32k_block(chunk_counter: u64, buf: &[u8]) -> Hash

With the limitation that it should only be used for non-root exact 32k-blocks; The existing guts API can fill in the gaps to hashing a full file. (32k just being an example - large enough to offer the advantages of SIMD, small enough to be useful in practical applications. Honestly, I doubt I'll be needing blocks smaller than 256k, and even that is pretty small for P2P file sharing)

A higher-level misuse-resistant API would be even better, of course. :)

oconnor663 commented 4 years ago

@cesarb, finalize_subtree is a great idea. No constructor bloat, and no need to decide one way or the other when you create the hasher. As for safety, maybe instead of making it a method, we could make it a standalone function. Then we could relegate it to the undocumented (or maybe in the future, documented-but-full-of-warnings) guts module, and most callers would be unlikely to use it accidentally. (IDE autocompletion doesn't suggest hidden items, right? What's the best practice for that sort of thing? Anyway, not making it a method should solve most of the problem.)

Speaking of warnings, there will be quite a few attached to this function, similar to what @17dec described:

  1. You have to use a consistent chunk size. If your first chunk is 16 KiB, then all of them except maybe the last one have to be 16 KiB.
  2. You have to be careful about short inputs, so that you don't do non-root finalization on something that actually is the root. If your first slice from the caller is $CHUNK_SIZE or less, you have to buffer it.

Luckily it's relatively easy to test these cases. If you have a test vector of the relevant length, you just need to make sure you get the same root hash that BLAKE3 does.

fadedbee commented 3 years ago

Posted a link to this issue to https://crypto.stackexchange.com/q/86508/17505

qti3e commented 1 year ago

Hey everyone, because of the requirements of our delivery protocol in Fleek Network (still in develop branch/wip), we also needed an incremental verification and hasher. We're doing most of the things in our codebase for the verification side and tree pruning, but we have this fork for Blake3 that adds minimal functionality to Blake3 to allow us to achieve what we wanted.

Here is to link to the source code for the said functionality, an incremental Blake3 hasher that also preserves the complete tree (in blocks of 256 chunks).

https://github.com/fleek-network/BLAKE3/blob/master/src/ursa.rs

And saw some comments in the guts.rs about the fact that it might make sense to provide the incremental version; well, here it is.

Our protocol is created to deliver content in blocks of 256KiB, so that's what the constants at the top of the file are, but a generic implementation can get the BLOCK_SIZE_IN_CHUNK_LOG_2 as a const on the HashTreeBuilder<const S: usize> etc.

Let me know if you think having a more generic implementation of this here as a PR makes sense.