RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.89k stars 256 forks source link

sha2: wasm32 simd128 backends #562

Closed max-te closed 3 weeks ago

max-te commented 9 months ago

This PR ports the AVX implementation of SHA-512 to simd128. It also implements the related version of SHA-256 from https://github.com/aws-samples/sha2-with-c-intrinsic/blob/master/src/sha256_compress_x86_64_avx.c in simd128. Also added wasm32 testing in CI using wasmtime. Since wasm does not have feature detection, this backend is only used if the -C target-feature=+simd128 flag is set.

Benchmarks on AMD Ryzen 9 7950X3D, running with wasmtime 26.0.0 (c92317bcc 2024-10-22) on rustc 1.84.0-nightly (b3f75cc87 2024-11-02):

+ RUSTFLAGS='-C target-feature=+simd128'
+ cargo +nightly bench -q --bench mod --target wasm32-wasip1

running 8 tests
test sha256_10    ... bench:          18.71 ns/iter (+/- 1.62) = 555 MB/s
test sha256_100   ... bench:         167.94 ns/iter (+/- 0.62) = 598 MB/s
test sha256_1000  ... bench:       1,656.93 ns/iter (+/- 142.75) = 603 MB/s
test sha256_10000 ... bench:      15,601.30 ns/iter (+/- 1,268.65) = 640 MB/s
test sha512_10    ... bench:          14.35 ns/iter (+/- 0.09) = 714 MB/s
test sha512_100   ... bench:         137.37 ns/iter (+/- 0.87) = 729 MB/s
test sha512_1000  ... bench:       1,261.63 ns/iter (+/- 105.65) = 793 MB/s
test sha512_10000 ... bench:      12,434.24 ns/iter (+/- 24.46) = 804 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 4.40s

+ RUSTFLAGS='-C target-feature=-simd128'
+ cargo +nightly bench -q --bench mod --target wasm32-wasip1

running 8 tests
test sha256_10    ... bench:         155.59 ns/iter (+/- 1.08) = 64 MB/s
test sha256_100   ... bench:       1,539.48 ns/iter (+/- 9.18) = 64 MB/s
test sha256_1000  ... bench:      15,207.34 ns/iter (+/- 81.67) = 65 MB/s
test sha256_10000 ... bench:     151,547.98 ns/iter (+/- 1,170.30) = 65 MB/s
test sha512_10    ... bench:          98.59 ns/iter (+/- 0.45) = 102 MB/s
test sha512_100   ... bench:         980.99 ns/iter (+/- 3.43) = 102 MB/s
test sha512_1000  ... bench:       9,622.94 ns/iter (+/- 29.97) = 103 MB/s
test sha512_10000 ... bench:      95,977.25 ns/iter (+/- 310.30) = 104 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 6.55s

+ RUSTFLAGS='--cfg sha2_backend="soft" -C target-feature=+simd128'
+ cargo +nightly bench -q --bench mod --target wasm32-wasip1

running 8 tests
test sha256_10    ... bench:         142.07 ns/iter (+/- 13.71) = 70 MB/s
test sha256_100   ... bench:       1,404.58 ns/iter (+/- 10.83) = 71 MB/s
test sha256_1000  ... bench:      14,823.81 ns/iter (+/- 1,346.05) = 67 MB/s
test sha256_10000 ... bench:     139,001.94 ns/iter (+/- 978.58) = 71 MB/s
test sha512_10    ... bench:          90.39 ns/iter (+/- 7.82) = 111 MB/s
test sha512_100   ... bench:         893.20 ns/iter (+/- 72.22) = 111 MB/s
test sha512_1000  ... bench:       8,812.46 ns/iter (+/- 878.60) = 113 MB/s
test sha512_10000 ... bench:      87,887.02 ns/iter (+/- 394.70) = 113 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 8.62s
max-te commented 9 months ago

I also ended up porting the SHA-256 algorithm from https://github.com/aws-samples/sha2-with-c-intrinsic/blob/master/src/sha256_compress_x86_64_avx.c and updated this PR. Here are updated benchmarks with simd:

test sha256_10    ... bench:          22 ns/iter (+/- 0) = 454 MB/s
test sha256_100   ... bench:         215 ns/iter (+/- 2) = 465 MB/s
test sha256_1000  ... bench:       1,959 ns/iter (+/- 8) = 510 MB/s
test sha256_10000 ... bench:      19,401 ns/iter (+/- 22) = 515 MB/s
test sha512_10    ... bench:          17 ns/iter (+/- 0) = 588 MB/s
test sha512_100   ... bench:         164 ns/iter (+/- 0) = 609 MB/s
test sha512_1000  ... bench:       1,476 ns/iter (+/- 2) = 677 MB/s
test sha512_10000 ... bench:      14,513 ns/iter (+/- 18) = 689 MB/s
CryZe commented 1 month ago

What's the status of this?

max-te commented 4 weeks ago

This is awaiting review.

@newpavlov Do you mind taking a look at this?

CryZe commented 3 weeks ago

What are advantages of the explicit SIMD backend in the SHA256 case? It has the same performance as the soft backend.

Did you look at the second comment in this PR? It seems like initially there was no SIMD algorithm used for SHA-256 in the initial version of this PR but that changed the next day as indicated by the second comment.

Unless you of course did some more benchmarking and it's indeed not faster anymore.

newpavlov commented 3 weeks ago

Ah, I indeed missed the second comment. I think it's worth to update OP since its text will be included in the merge commit message.

newpavlov commented 3 weeks ago

@max-te I think we are good to merge. But could you measure performance of the software backend with enabled simd128 target feature on the same hardware? You can do it with this command:

RUSTFLAGS='--cfg sha2_backend="soft" -C target-feature=+simd128' cargo +nightly bench --target wasm32-wasi

I would like to add these results to the merge commit message.

max-te commented 3 weeks ago

Sure, I added that benchmark to the PR description and updated the other ones.

newpavlov commented 3 weeks ago

Thank you!