BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
4.71k stars 315 forks source link

Make `--no-mmap` calls still use parallelism when filesizes are large #361

Open ultrabear opened 8 months ago

ultrabear commented 8 months ago

This change uses double buffers that are each 1MiB large, while one buffer is filling from the OS, the other buffer is hashed using update_rayon. This is around twice as fast as just using update_reader for files of 1GiB in size on my machine (ryzen 2600), and half as fast as using mmap.

The code also accounts for small files, if a file is under 1MiB it will fall back to update_reader, this ensures that the change is always at least neutral in performance, because we overshot the actual place where update_rayon becomes faster, we never see cases where it is slower.

Currently the code uses the read_chunks crate, which is something I made to handle EINTR and try and fully fill the read buffer, if this is approved to merge I would want to take the function it calls and just cut it into this project somewhere, instead of adding an extra dependency.

Some crude benchmarks below, hashing a gibibyte of random data; (b3sum 1.5.0 vs 03e0949d13cebe3c04e1c908d25cf1e22bc71623)

# this PR
[b3sum]$ time ./target/release/b3sum --no-mmap gigafile
303966b0ba3c0766247f911d8f7dd172cffa1952bf1106f801fcf7e1455ce5c0  gigafile

real    0m0.253s
user    0m1.234s
sys 0m0.501s
# unmodified binary
[b3sum]$ time b3sum --no-mmap gigafile
303966b0ba3c0766247f911d8f7dd172cffa1952bf1106f801fcf7e1455ce5c0  gigafile

real    0m0.570s
user    0m0.477s
sys 0m0.091s
# unmodified binary, with mmap enabled
[b3sum]$ time b3sum gigafile
303966b0ba3c0766247f911d8f7dd172cffa1952bf1106f801fcf7e1455ce5c0  gigafile

real    0m0.126s
user    0m1.067s
sys 0m0.103s