RustCrypto / stream-ciphers

Collection of stream cipher algorithms
255 stars 49 forks source link

Salsa20 SSE2 version #328

Closed oxarbitrage closed 8 months ago

oxarbitrage commented 1 year ago

Part of https://github.com/RustCrypto/stream-ciphers/issues/50

Hello, i had been trying to optimize salsa20 for sse2 since a while so i decided to publish a PR even if:

With that said, feel free to discard the PR if you think it will not worth doing. Otherwise, i am totally open to suggestions and feedback to make this mergable to the project. Please review carefully if so, i might be missing important stuff. Thanks!

With the current salsa20 implementation i get this output from benches on my current computer:

running 4 tests
test salsa20_bench1_16b   ... bench:          25 ns/iter (+/- 0) = 640 MB/s
test salsa20_bench2_256b  ... bench:         348 ns/iter (+/- 11) = 735 MB/s
test salsa20_bench3_1kib  ... bench:       1,375 ns/iter (+/- 113) = 744 MB/s
test salsa20_bench4_16kib ... bench:      23,348 ns/iter (+/- 2,823) = 701 MB/s

While with this PR:

running 4 tests
test salsa20_bench1_16b   ... bench:          22 ns/iter (+/- 0) = 727 MB/s
test salsa20_bench2_256b  ... bench:         304 ns/iter (+/- 3) = 842 MB/s
test salsa20_bench3_1kib  ... bench:       1,203 ns/iter (+/- 34) = 851 MB/s
test salsa20_bench4_16kib ... bench:      20,548 ns/iter (+/- 2,458) = 797 MB/s

This numbers can vary from run to run a bit but the performance increase is consistent.

tarcieri commented 8 months ago

Apologies on the delayed review on this. I've glanced through it a few times and it seems mostly reasonable. I'd like to do a more detailed review before merging though, and it's been hard to justify prioritizing given the relatively meager performance gains.

tarcieri commented 8 months ago

Reviewed this again. It seems reasonable enough, and we're about to start making breaking changes anyway so I'd like to get it in before that.