BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.07k stars 346 forks source link

Add function to efficiently hash entire non-root subtrees to guts #329

Open rklaehn opened 1 year ago

rklaehn commented 1 year ago

provides an efficient way to compute the hash for multiple chunks that do not start at chunk 0, as part of a larger tree.

This is useful for crates like bao, abao and bao-tree when using chunk groups.

Adding this required to disable some assertions that contained assumptions that are no longer true.

This gives a performance improvement of 4x when computing an outboard with a chunk group size of 16kb in bao-tree compared to using blake3::guts::ChunkState and blake3::guts::parent_cv to compute a hash for a chunk group.

UPDATE:

this works in the case of bao-tree, but the signature is not good since it can't work in the general case. E.g.

let data = [0u8; 2048];
let hash = hash_block(1, &data, false);

This does not make any sense at all, since there is no single root for chunks 1 and 2. I guess you could put in an assert to catch things like this?

rklaehn commented 1 year ago

I am not sure what is the best way to expose the required method to guts. It does not matter as long as there is a way to hash a non-root subtree that starts at a non-zero chunk.

rklaehn commented 1 year ago

I am not quite sure, but it seems that what this blake3 fork is doing would also be covered with this: https://github.com/fleek-network/BLAKE3/commit/0e15418f195d138a52f5b86210d721fcb4f370d5

oconnor663-travel commented 1 year ago

I'm out of the country and mostly offline until August 20, so I won't be able to take a proper look until then. But you might be interested in the new "guts" API I'm in the middle of:

Apologies for the big pile of undocumented code there :) But my hope is that I can port BLAKE3's existing SIMD implementations to that expanded API and then release that as a (permanently unstable) crate that other low-level callers can depend on. Here are a couple other projects I hope to support with the same API:

rklaehn commented 1 year ago

@oconnor663-travel this looks awesome. No stress, we have forked the crate and are depending on that one now, so there is not much urgency. Once a crate is out that allows us to compute subtree hashes efficiently, we will yank that one and go back to upstream.

Happy to close this if there is something better in the works. It would be nice if there was an example or a fn for the hash_subtree use case implemented in this PR.

Enjoy being offline.