Closed TomMettam closed 4 years ago
Hi!
Well, it should be not exponentially, but linearly more expensive: N threads X operations = N X operations in one thread.
It's interesting that increasing the number of threads doesn't make it worse on web, I don't know why. The parameters I used are Iterations=128, Memory=16384 KiB, Type=Argon2i: -t 128 -m 14
.
Here's the time I get, while results are absolutely the same:
Native | WASM | |
---|---|---|
1 thread | 1.0s | 5.0s |
2 threads | 1.1s | 5.1s |
4 threads | 1.2s | 5.1s |
8 threads | 1.7s | 5.1s |
16 threads | 2.0s | 5.1s |
128 threads | 4.2s | 5.2s |
1024 threads | 26s | 7.3s |
I need to investigate how threading works in argon2 and on which parameters we can benefit from it. I tried increasing the number of iterations, but it behaves the same way, that's very strange.
More iterations, less memory, Argon2i, Iterations=10000, Memory=1024 KiB: -t 10000 -m 10 -i (results are the same again). |
Native | WASM | |
---|---|---|---|
1 thread | 4.1s | 22s | |
2 threads | 6.8s | 23s | |
4 threads | 9.5s | 23s | |
8 threads | 16s | 24s |
I'm not sure I get the point of having threading support in Argon2 🤔 when do we need it?
If you're interested, I've just built a version with SIMD, check it out, it's 2x faster: https://antelle.net/argon2-browser/ (works only in Chrome with a flag enabled; not for production) For threading support – I need some input here. I don't see it being helpful right now, but I'll play more with it, maybe I'll figure it out.
Just noticed how the native version behaves without threading support:
echo -n password | argon2 somesalt -t 10000 -m 10 -i -p 4
Type: Argon2i
Iterations: 10000
Memory: 1024 KiB
Parallelism: 4
Hash: 93c411bb2d52104d943d13b30665095ba7e1ed81e0a95fb1899547ea4987e3e1
Encoded: $argon2i$v=19$m=1024,t=10000,p=4$c29tZXNhbHQ$k8QRuy1SEE2UPROzBmUJW6fh7YHgqV+xiZVH6kmH4+E
1.446 seconds
Verification ok
And with threading support:
echo -n password | argon2 somesalt -t 10000 -m 10 -i -p 4
Type: Argon2i
Iterations: 10000
Memory: 1024 KiB
Parallelism: 4
Hash: 93c411bb2d52104d943d13b30665095ba7e1ed81e0a95fb1899547ea4987e3e1
Encoded: $argon2i$v=19$m=1024,t=10000,p=4$c29tZXNhbHQ$k8QRuy1SEE2UPROzBmUJW6fh7YHgqV+xiZVH6kmH4+E
6.834 seconds
Verification ok
Okay I see, it reports CPU time, not wallclock: https://github.com/P-H-C/phc-winner-argon2/issues/204.
Threaded version is actually quite easy to build, however it doesn't give any improvement while the file size grows significantly (2x wasm and 4x js), you can also try doing it by adding -s USE_PTHREADS=1
:
39K argon2.wasm
60K argon2.js
At the moment I don't think we should release it, but I'll keep an eye on its progress.
Hi! Well, it should be not exponentially, but linearly more expensive: N threads X operations = N X operations in one thread. It's interesting that increasing the number of threads doesn't make it worse on web, I don't know why. The parameters I used are Iterations=128, Memory=16384 KiB, Type=Argon2i:
-t 128 -m 14
.Here's the time I get, while results are absolutely the same:
Native WASM 1 thread 1.0s 5.0s 2 threads 1.1s 5.1s 4 threads 1.2s 5.1s 8 threads 1.7s 5.1s 16 threads 2.0s 5.1s 128 threads 4.2s 5.2s 1024 threads 26s 7.3s I need to investigate how threading works in argon2 and on which parameters we can benefit from it. I tried increasing the number of iterations, but it behaves the same way, that's very strange.
I confirm, in my implementation I had tried too and the results are very closed even with more threads.
I do think this is completely wanted. Argon2 is designed to limit hardware brute-forcing. Check this article guys, very instructive: https://eprint.iacr.org/2016/104.pdf In the section 2: "Security Requirements of Password-Hashing Schemes".
From my understanding, the parallelism is here to enforce the hardware to use more resources, even if this is not speeding up the process.
So, activating the multi-threading is important in a way that you enforce someone trying to attack you to use more resources.
@glihm it depends how calculations are made: either we set a predefined complexity and then threading support should make it faster (that's how I assume it works), or complexity grows with increasing parallelism (then single-threaded implementation would take more time for higher parallelism values).
With the native implementation, increasing the amount of parallelism is intended to keep the time constant but increase complexity.
It's very strange to me that using a single thread with a larger parallelism value would not incur a larger time penalty. Perhaps the threading isn't being utilised correctly (so it's always using one thread)?
I will do some experiments and research on this topic.
increasing the amount of parallelism is intended to keep the time constant but increase complexity
Is it stated anywhere in docs? I see that it's the opposite: increasing parallelism allows implementations to calculate the hash using several threads, then how they make use of this possibility it's up to them. For example, in WASM we're not using threads at all, so the performance remains constant. The native tool uses them with different degree of success.
Found it in docs:
So, it's like I describe it: it's number of chains that can be run. The complexity is exactly the same, however required work can be splitted.
@TomMettam I've pushed my threading experiments to this branch, it contains a working example that can run up to 8 threads, you can play with it if you want. Compile it, break it, love it, hate it, let me know if you manage to get anything interesting out of it. I've also uploaded it here if you just want to run it in your browser and compare performance: https://share.antelle.net/argon2-threads/, spoiler: it's slow 🐌 (≈2x) and big 🥴 (≈+40k).
I believe this is now resolved and we can close the issue. The branch is here, and I'll return to it once a year or if there's any news about threads in WASM.
@TomMettam I've pushed my threading experiments to this branch, it contains a working example that can run up to 8 threads, you can play with it if you want. Compile it, break it, love it, hate it, let me know if you manage to get anything interesting out of it. I've also uploaded it here if you just want to run it in your browser and compare performance: https://share.antelle.net/argon2-threads/, spoiler: it's slow snail (≈2x) and big woozy_face (≈+40k).
I realize this is not an active topic anymore, I just want to note that in my compiled test, the threaded version is significantly faster than the non-threaded version.
On latest master branch, vs threaded branch on latest emscripten docker image and using Firefox I get ~10 seconds, vs ~2.5 seconds on parallelism = 16, iterations = 10, memory = 1GiB on a 4 core, 8 thread system.
Any chance of re-enabling threads?
@quexten how's testing been on your branch? Are you using it in any projects?
Friendly bump.
@antelle is this project still maintained? I noticed there haven't been any commits in a while but sometimes that's normal.
Hello!
In issue #15 threads were disabled. My understanding is that this means that generating a hash with a parallelism value of > 1 becomes exponentially more expensive for the defender but cheaper for an attacker using a different library.
WebAssembly Threads are a lot more mature now, being enabled by default in Chrome since v74. Could the possibility of re-enabling threads be investigated?