RustCrypto / hashes

Collection of cryptographic hash functions written in pure Rust
1.81k stars 245 forks source link

Performance issue on SHA3-Keccak256 since v0.10.7 #468

Closed Slixe closed 1 year ago

Slixe commented 1 year ago

Hey there,

After an update to the last version for this crate, I've found a strange problem, my performance have decreased to half of what I get with v0.10.6.

I'm using a Ryzen 3700X with 32GB on Debian. In case you believe its my fault, here is what I saw: To see the performance issue you can run cargo run --release --bin xelis_miner -- --benchmark --iterations=1000000 on this commit: https://github.com/xelis-project/xelis-blockchain/commit/7a31b0bf92167fdba283351bab61f4943484e19b And on the previous commit to compare.

Thanks

newpavlov commented 1 year ago

Could you run your benchmark with keccak downgraded to an older version? Changes in sha3 v0.10.7 are either disabled by default (the asm feature), or purely additive, so they should not influence performance. It also could be a compiler regression, so I suggest running benchmarks with older Rust versions.

tarcieri commented 1 year ago

I'm very surprised to hear about any performance regressions on x86/x86_64, as all of the relevant changes I can think of were ARM-related

tarcieri commented 1 year ago

@Slixe what would be really helpful is if you could git bisect the changes in this repo and see if you can narrow down where it occurred

Slixe commented 1 year ago

Could you run your benchmark with keccak downgraded to an older version? Changes in sha3 v0.10.7 are either disabled by default (the asm feature), or purely additive, so they should not influence performance. It also could be a compiler regression, so I suggest running benchmarks with older Rust versions.

I've tried with the keccak v0.1.2 on sha3 v0.10.7 and still the same. I've also tried with lower version of digest too, same result.

I've tried also on rust toolchain 1.69.0 (latest), 1.68, 1.65, 1.64, same for these versions...

Performances are still much better on sha3 v0.10.6, and I don't understand why.

Slixe commented 1 year ago

@Slixe what would be really helpful is if you could git bisect the changes in this repo and see if you can narrow down where it occurred

According to the git bisect, the bad commit is 9b218cf4b8b6a3c5dc33bd7772ece7051a5d43d6 where my performance are bad.

This change is what reduced my performances: https://github.com/RustCrypto/hashes/commit/9b218cf4b8b6a3c5dc33bd7772ece7051a5d43d6#diff-90cc61bbc907b3ec5bb54982793be9757502097a40a475873b6bab46d5874889L19

If I put keccak::f1600(&mut self.state); again in absorb_block my performances are good again.

tarcieri commented 1 year ago

Interesting /cc @aewag

aewag commented 1 year ago

Hmm, my first guess: maybe it is because the round_count is no longer a const and this hinders the compiler to do some optimizations.

Could you change the call within absorb_block to the keccak_p as following:

keccak::keccak_p(&mut self.state, DEFAULT_ROUND_COUNT);

This should result in the same speed as with using f1600.

Slixe commented 1 year ago

No, still not, it's improving only by ~1-5% but still not as much as I had before the update.

aewag commented 1 year ago

Thanks. Just looked again into it. The issue lies within the generic approach of keccak_p. If keccak_p is called LaneSize is evaluated at runtime. Thus, the high performance difference.

I just wrote a fix and tested it. My approach would be to add functions for keccak_p<S> and S out of [200, 400, 800, 1600]. Similar to the keccak_f routines.

aewag commented 1 year ago

Following are the benchmarks evaluated with keccak_p and p1600 functions:

current master of sha3 crate:

test sha3_256_10    ... bench:         141 ns/iter (+/- 45) = 70 MB/s
test sha3_256_1000  ... bench:      14,652 ns/iter (+/- 7,731) = 68 MB/s
test sha3_256_10000 ... bench:     144,600 ns/iter (+/- 53,622) = 69 MB/s

with p1600 routines added in keccak crate and used in sha3 crate:

running 3 tests
test sha3_256_10    ... bench:          40 ns/iter (+/- 2) = 250 MB/s
test sha3_256_1000  ... bench:       3,985 ns/iter (+/- 271) = 250 MB/s
test sha3_256_10000 ... bench:      39,327 ns/iter (+/- 1,277) = 254 MB/s

@tarcieri If the draft PR in the sponges directory looks good to you, I would add, after a keccak crate release, a PR to the hashes repository.

tarcieri commented 1 year ago

@aewag can you update this PR to use keccak from git so @Slixe can confirm the issue is fixed?

tarcieri commented 1 year ago

Fixed in v0.10.8: #474