RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.75k stars 238 forks source link

sha2: wasm32 simd128 backends #562

Open max-te opened 4 months ago

max-te commented 4 months ago

This PR ports the AVX implementation of SHA-512 to simd128 and adds wasm32 testing in CI using wasmtime. Since wasm does not have feature detection, this backend is only used if the -C target-feature=+simd128 flag is set.

Benchmarks on AMD Ryzen 9 7950X3D, running with wasmtime, with simd:

❯ RUSTFLAGS="-C target-feature=+simd128" cargo +nightly bench --target wasm32-wasi
[...]
running 8 tests
test sha256_10    ... bench:         164 ns/iter (+/- 5) = 60 MB/s
test sha256_100   ... bench:       1,645 ns/iter (+/- 35) = 60 MB/s
test sha256_1000  ... bench:      16,421 ns/iter (+/- 641) = 60 MB/s
test sha256_10000 ... bench:     163,818 ns/iter (+/- 1,016) = 61 MB/s
test sha512_10    ... bench:          17 ns/iter (+/- 0) = 588 MB/s
test sha512_100   ... bench:         166 ns/iter (+/- 1) = 602 MB/s
test sha512_1000  ... bench:       1,491 ns/iter (+/- 5) = 670 MB/s
test sha512_10000 ... bench:      14,670 ns/iter (+/- 59) = 681 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out

without simd:

❯ RUSTFLAGS="-C target-feature=-simd128" cargo +nightly bench --target wasm32-wasi
[...]
running 8 tests
test sha256_10    ... bench:         168 ns/iter (+/- 0) = 59 MB/s
test sha256_100   ... bench:       1,667 ns/iter (+/- 8) = 59 MB/s
test sha256_1000  ... bench:      16,387 ns/iter (+/- 76) = 61 MB/s
test sha256_10000 ... bench:     163,630 ns/iter (+/- 680) = 61 MB/s
test sha512_10    ... bench:         111 ns/iter (+/- 0) = 90 MB/s
test sha512_100   ... bench:       1,093 ns/iter (+/- 5) = 91 MB/s
test sha512_1000  ... bench:      10,740 ns/iter (+/- 22) = 93 MB/s
test sha512_10000 ... bench:     107,104 ns/iter (+/- 302) = 93 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out
max-te commented 4 months ago

I also ended up porting the SHA-256 algorithm from https://github.com/aws-samples/sha2-with-c-intrinsic/blob/master/src/sha256_compress_x86_64_avx.c and updated this PR. Here are updated benchmarks with simd:

test sha256_10    ... bench:          22 ns/iter (+/- 0) = 454 MB/s
test sha256_100   ... bench:         215 ns/iter (+/- 2) = 465 MB/s
test sha256_1000  ... bench:       1,959 ns/iter (+/- 8) = 510 MB/s
test sha256_10000 ... bench:      19,401 ns/iter (+/- 22) = 515 MB/s
test sha512_10    ... bench:          17 ns/iter (+/- 0) = 588 MB/s
test sha512_100   ... bench:         164 ns/iter (+/- 0) = 609 MB/s
test sha512_1000  ... bench:       1,476 ns/iter (+/- 2) = 677 MB/s
test sha512_10000 ... bench:      14,513 ns/iter (+/- 18) = 689 MB/s