Open tarcieri opened 3 years ago
👋 Hey there, would it be possible to assign this issue to me? I'd like to implement it as part of a work project and have some allocated time.
We're expecting that this should provide a fairly significant speedup to x86 users who have system made in the last 10 years. My preliminary benchmarking against the C library shows that there's a 40% improvement in speed with 64MiB, and matches the memory size in "helpfulness" as it goes down.
I don't plan to port the AVX512 implementation right away since I don't have a way to test it and, IMO, its of limited usefulness in the correct scenario where passwords are hashed on the client since most consumer CPUs don't support it anyway.
@complexspaces sounds great! Even the AVX2 implementation would be fantastic.
FWIW we use the cpufeatures
crate elsewhere to autodetect AVX2 and use it when available.
Given that #440 has been merged, is this still an issue? If so, what needs to be done to address it?
opt.c
contains a natively AVX2/AVX512-optimized implementation it would be nice to eventually port overAs an update on #408, the current argon2 AVX2 implementation is now only 2-3x slower than the Go Implementation even for high Parallelism
costs, and (as expected) beats the Go implementation when Parallelism = 1
. Tested on an AMD Ryzen 7 5700G, best of 3 runs.
With Memory = 128 MiB
, Time = 64
Parallelism | Rust Time | Go Time | C Time |
---|---|---|---|
1 | 3.36 s | 4.02 s | 2.62 s |
2 | 3.41 s | 2.22 s | 1.68 s |
4 | 3.45 s | 1.48 s | 1.31 s |
8 | 3.49 s | 1.24 s | 1.26 s |
16 | 3.52 s | 1.21 s | 1.24 s |
With Memory = 64 MiB
, Time = 32
Parallelism | Rust Time | Go Time | C Time |
---|---|---|---|
1 | 805 ms | 965 ms | 621 ms |
2 | 825 ms | 527 ms | 414 ms |
4 | 833 ms | 348 ms | 309 ms |
8 | 847 ms | 298 ms | 298 ms |
16 | 853 ms | 287 ms | 291 ms |
The current
argon2
crate implementation is a translation ofref.c
from the reference implementation:https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/ref.c
It could be improved by translating
opt.c
instead, which provides e.g. SIMD support:https://github.com/P-H-C/phc-winner-argon2/blob/92cd2e1/src/opt.c