argon2: optimize with AVX2 SIMD

RustCrypto / password-hashes

Password hashing functions / KDFs

677 stars 84 forks source link

argon2: optimize with AVX2 SIMD #440

Closed dyc3 closed 1 year ago

dyc3 commented 1 year ago

I've tried my hardest to get this to work, but I think I'm in a little over my head. The most I've managed to do is port the macros from the optimized reference version.

~~My main problem is that I have no idea how these indexes into state work:~~

~~https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/opt.c#L94-L102~~

If someone could give me a hand, I'd be happy to fix/finish this. This is the first time I've ever tried to do stuff with SIMD, so bear with me as I'm still learning.

related to #104

dyc3 commented 1 year ago

Ok, I was able to verify that I ported all the macros correctly (at least as far as I could tell), and I fixed some stuff. I'm still not entirely sure where I'm going wrong here, as the tests are still failing.

dyc3 commented 1 year ago

I found out there was a way to tell the compiler to generate code based on a target feature being enabled with the #[target_feature(enable = "avx2")] attribute. With this method, it likely be safe to do the same thing with avx512, but I don't have a CPU that supports it to make sure. It also completely sidesteps the code gen problems I was having before (demonstrated in https://godbolt.org/z/TM94EbjKf).

https://godbolt.org/z/843z7eGYb demonstrates the code gen using this method, which is implemented in 70e34de.

Running benchmarks on my Ryzen 9 3900X, it shows about a 20-30% performance improvement. Would this solution be acceptable? I can clean up the commit history here if it matters.

newpavlov commented 1 year ago

So the compress function implementations are identical and the only difference is #[target_feature(enable = "avx2")], correct? I don't think we need to duplicate the function implementations. You could try to mark the compress function as #[inline(always)] and then introduce an AVX2 wrapper around it.