debris / tiny-keccak

An implementation of Keccak derived functions specified in FIPS-202, SP800-185 and KangarooTwelve
Creative Commons Zero v1.0 Universal
194 stars 50 forks source link

clean up some unsafety, help LLVM elide bounds checks #5

Closed rphmeier closed 8 years ago

rphmeier commented 8 years ago

Actually got faster (reproducible for me, may want to check on other machines)

Benches After:

running 2 tests
test bench_sha3_256_input_4096_bytes ... bench:      21,322 ns/iter (+/- 2,177) = 192 MB/s
test keccakf_u64                     ... bench:         676 ns/iter (+/- 58) = 36 MB/s

Before:

running 2 tests
test bench_sha3_256_input_4096_bytes ... bench:      24,544 ns/iter (+/- 706) = 166 MB/s
test keccakf_u64                     ... bench:         771 ns/iter (+/- 104) = 32 MB/s

Comparison After:

running 4 tests
test rust_crypto_sha3_256_input_32_bytes   ... bench:       1,822 ns/iter (+/- 721) = 17 MB/s
test rust_crypto_sha3_256_input_4096_bytes ... bench:      52,218 ns/iter (+/- 3,668) = 78 MB/s
test tiny_keccak_sha3_256_input_32_bytes   ... bench:         705 ns/iter (+/- 39) = 45 MB/s
test tiny_keccak_sha3_256_input_4096_bytes ... bench:      21,379 ns/iter (+/- 2,600) = 191 MB/s

Before:

running 4 tests
test rust_crypto_sha3_256_input_32_bytes   ... bench:       1,822 ns/iter (+/- 421) = 17 MB/s
test rust_crypto_sha3_256_input_4096_bytes ... bench:      52,126 ns/iter (+/- 3,583) = 78 MB/s
test tiny_keccak_sha3_256_input_32_bytes   ... bench:         808 ns/iter (+/- 64) = 39 MB/s
test tiny_keccak_sha3_256_input_4096_bytes ... bench:      25,114 ns/iter (+/- 3,347) = 163 MB/s

It would be nice to get rid of as_bytes_slice unsafety, but I don't see a way of doing it without likely hurting performance.