RustCrypto / asm-hashes

Assembly implementations of cryptographic hash functions
46 stars 26 forks source link

POC: AVX2 sha256 #32

Closed haraldh closed 3 years ago

haraldh commented 3 years ago

Here is a quick POC for sha256 with SIMD... let me know, if you accept code with the OpenIB.org BSD license.

Before:

test bench1_10    ... bench:          39 ns/iter (+/- 2) = 256 MB/s
test bench2_100   ... bench:         349 ns/iter (+/- 23) = 286 MB/s
test bench3_1000  ... bench:       3,412 ns/iter (+/- 11) = 293 MB/s
test bench4_10000 ... bench:      34,084 ns/iter (+/- 3,183) = 293 MB/s

After:

test bench1_10    ... bench:          27 ns/iter (+/- 1) = 370 MB/s
test bench2_100   ... bench:         232 ns/iter (+/- 5) = 431 MB/s
test bench3_1000  ... bench:       2,082 ns/iter (+/- 85) = 480 MB/s
test bench4_10000 ... bench:      20,543 ns/iter (+/- 2,686) = 486 MB/s
tarcieri commented 3 years ago

Thanks for opening this PR! It'd be great to have AVX2 support for SHA-256, unfortunately I think the licensing is going to pose a problem for us.

Offhand I'm not sure where to get an implementation which is license compatible (I just checked the public domain sources in the eBACS SUPERCOP repo but there doesn't appear to be one).

tarcieri commented 3 years ago

Also note that where possible we like to have core::arch over ASM, as it's easier to maintain and generally safer. All of our current AVX2 backends are written that way. Here's an example:

https://github.com/RustCrypto/universal-hashes/blob/master/poly1305/src/backend/avx2/helpers.rs

Not sure how interested you are in this particular problem but if you'd really like to dig into it, that'd be great.

haraldh commented 3 years ago

I am sorry, it was just a POC... I don't have the time to dig into that further.

newpavlov commented 3 years ago

The licensing issue is not critical (e.g. the currently used assembly files are licensed under MIT only), but we certainly would like to keep licensing of crates simple if possible. Also I agree with @tarcieri about using intrinsics, currently we have to use asm files for ARM only because the relevant intrinsics are currently unstable.