RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.81k stars 245 forks source link

tiger: implement TTH algorithm #494

Open baodrate opened 1 year ago

baodrate commented 1 year ago

Implements the Tiger Tree Hash algorithm as mentioned here.

I'm a rust newbie, so implementation (and general Rust) advice is welcome.

The digest crate doesn't seem to be happy with block sizes > 128, but TTH operates on 1024K data blocks, so I emulated it internally. Doesn't feel like the best way to do this but I couldn't figure out a more elegant method.

I didn't attempt any optimization so performance isn't great (although much better than I expected), approx 20-30% slower than https://github.com/rhash/RHash.

baodrate commented 1 year ago

I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.

That's a good point. I opted for the easier implementation to get some feedback, but I think the constant-memory version shouldn't be too hard. I would have to give it a input limit. Assuming 2^64B is a reasonable size limit, that's gives a max tree height of 54 (with 1024B blocks) and a minimum space requirement of Tiger::OutputSize * 54 ~= 1.3KiB. Which seems reasonable?

IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it.

I'm a little unsure about this suggestion. Do you mean this library should depend on rayon? Would multi-threading behavior be configurable by the user? Or is this something we would allow the user to handle by e.g. exposing hash_nodes, so the user can calculate the partial trees in parallel and pass the collection of hashes to hash_nodes to get the hash of the full tree?

newpavlov commented 1 year ago

I think the constant-memory version shouldn't be too hard.

You can continue to use dynamic allocation, the point is to reduce amount of used memory.

Do you mean this library should depend on rayon?

Yes, but rayon can be optional, disabled-by-default dependency.

Would multi-threading behavior be configurable by the user?

No need for separate configuration knobs, rayon defaults should be fine.

BTW, can you add link(s) for TTH test vector source?

baodrate commented 1 year ago

You can continue to use dynamic allocation, the point is to reduce amount of used memory

I've updated the algorithm to reduce memory usage. I've also swapped out Vec for arrayvec as per my previous comment. If this isn't acceptable I can switch back to using Vec.

BTW, can you add link(s) for TTH test vector source?

TTH tests were calculated by hand by hashing individual blocks with the tiger implementation in this repo, and confirmed with rhash and tthsum. I can look for more authoritative tests. But TTH doesn't seem to be well-documented (rather, it seems to be the consensus of various P2P file-sharing software).

baodrate commented 1 year ago

I gave integrating rayon a shot, but I couldn't get a measurable difference in performance. edit: I've profiled the hasher and it does seem like there should be room for rayon to do its thing, I just can't seem to get it to parallelize properly. Will keep working on it

note to self: the blake3 crate (https://github.com/BLAKE3-team/BLAKE3) implements digest and supports (optionally) rayon multithreading

edit: added rayon-multithreading modeled after the blake3 crate (-SIMD stuff) and I still can't beat the performance of the naive implementation. I'm so bad at this dudes 😭

TTH/reference/TigerTreeCore
    time:   [3.7648 µs 3.7824 µs 3.8038 µs]
    thrpt:  [514.47 MiB/s 517.37 MiB/s 519.81 MiB/s]
TTH/serial/compress_subtree<Serial>
    time:   [3.8919 µs 3.9154 µs 3.9416 µs]
    thrpt:  [496.48 MiB/s 499.81 MiB/s 502.83 MiB/s]
TTH/rayon/compress_subtree<Rayon>/1
    time:   [20.298 µs 20.656 µs 21.018 µs]
    thrpt:  [93.110 MiB/s 94.742 MiB/s 96.408 MiB/s]
TTH/rayon/compress_subtree<Rayon>/2
    time:   [14.996 µs 15.198 µs 15.420 µs]
    thrpt:  [126.91 MiB/s 128.77 MiB/s 130.49 MiB/s]
TTH/rayon/compress_subtree<Rayon>/4
    time:   [3.6958 µs 3.7084 µs 3.7219 µs]
    thrpt:  [525.79 MiB/s 527.71 MiB/s 529.51 MiB/s]
TTH/rayon/compress_subtree<Rayon>/8
    time:   [3.7162 µs 3.7411 µs 3.7685 µs]
    thrpt:  [519.29 MiB/s 523.10 MiB/s 526.59 MiB/s]
TTH/rayon/compress_subtree<Rayon>/16
    time:   [3.6799 µs 3.6958 µs 3.7153 µs]
    thrpt:  [526.73 MiB/s 529.50 MiB/s 531.80 MiB/s]