antelle / argon2-browser

Argon2 library compiled for browser runtime
https://antelle.net/argon2-browser
MIT License
363 stars 78 forks source link

Investigate possibility of re-enabling threads? #52

Closed TomMettam closed 3 years ago

TomMettam commented 3 years ago

Hello!

In issue #15 threads were disabled. My understanding is that this means that generating a hash with a parallelism value of > 1 becomes exponentially more expensive for the defender but cheaper for an attacker using a different library.

WebAssembly Threads are a lot more mature now, being enabled by default in Chrome since v74. Could the possibility of re-enabling threads be investigated?

antelle commented 3 years ago

Hi! Well, it should be not exponentially, but linearly more expensive: N threads X operations = N X operations in one thread. It's interesting that increasing the number of threads doesn't make it worse on web, I don't know why. The parameters I used are Iterations=128, Memory=16384 KiB, Type=Argon2i: -t 128 -m 14.

Here's the time I get, while results are absolutely the same:

Native WASM
1 thread 1.0s 5.0s
2 threads 1.1s 5.1s
4 threads 1.2s 5.1s
8 threads 1.7s 5.1s
16 threads 2.0s 5.1s
128 threads 4.2s 5.2s
1024 threads 26s 7.3s

I need to investigate how threading works in argon2 and on which parameters we can benefit from it. I tried increasing the number of iterations, but it behaves the same way, that's very strange.

antelle commented 3 years ago
More iterations, less memory, Argon2i, Iterations=10000, Memory=1024 KiB: -t 10000 -m 10 -i (results are the same again). Native WASM
1 thread 4.1s 22s
2 threads 6.8s 23s
4 threads 9.5s 23s
8 threads 16s 24s

I'm not sure I get the point of having threading support in Argon2 🤔 when do we need it?

antelle commented 3 years ago

If you're interested, I've just built a version with SIMD, check it out, it's 2x faster: https://antelle.net/argon2-browser/ (works only in Chrome with a flag enabled; not for production) For threading support – I need some input here. I don't see it being helpful right now, but I'll play more with it, maybe I'll figure it out.

Just noticed how the native version behaves without threading support:

echo -n password | argon2 somesalt -t 10000 -m 10 -i -p 4
Type:       Argon2i
Iterations: 10000
Memory:     1024 KiB
Parallelism:    4
Hash:       93c411bb2d52104d943d13b30665095ba7e1ed81e0a95fb1899547ea4987e3e1
Encoded:    $argon2i$v=19$m=1024,t=10000,p=4$c29tZXNhbHQ$k8QRuy1SEE2UPROzBmUJW6fh7YHgqV+xiZVH6kmH4+E
1.446 seconds
Verification ok

And with threading support:

echo -n password | argon2 somesalt -t 10000 -m 10 -i -p 4
Type:       Argon2i
Iterations: 10000
Memory:     1024 KiB
Parallelism:    4
Hash:       93c411bb2d52104d943d13b30665095ba7e1ed81e0a95fb1899547ea4987e3e1
Encoded:    $argon2i$v=19$m=1024,t=10000,p=4$c29tZXNhbHQ$k8QRuy1SEE2UPROzBmUJW6fh7YHgqV+xiZVH6kmH4+E
6.834 seconds
Verification ok
antelle commented 3 years ago

Okay I see, it reports CPU time, not wallclock: https://github.com/P-H-C/phc-winner-argon2/issues/204.

antelle commented 3 years ago

Threaded version is actually quite easy to build, however it doesn't give any improvement while the file size grows significantly (2x wasm and 4x js), you can also try doing it by adding -s USE_PTHREADS=1:

 39K argon2.wasm
 60K argon2.js

At the moment I don't think we should release it, but I'll keep an eye on its progress.

glihm commented 3 years ago

Hi! Well, it should be not exponentially, but linearly more expensive: N threads X operations = N X operations in one thread. It's interesting that increasing the number of threads doesn't make it worse on web, I don't know why. The parameters I used are Iterations=128, Memory=16384 KiB, Type=Argon2i: -t 128 -m 14.

Here's the time I get, while results are absolutely the same:

Native WASM 1 thread 1.0s 5.0s 2 threads 1.1s 5.1s 4 threads 1.2s 5.1s 8 threads 1.7s 5.1s 16 threads 2.0s 5.1s 128 threads 4.2s 5.2s 1024 threads 26s 7.3s I need to investigate how threading works in argon2 and on which parameters we can benefit from it. I tried increasing the number of iterations, but it behaves the same way, that's very strange.

I confirm, in my implementation I had tried too and the results are very closed even with more threads.

I do think this is completely wanted. Argon2 is designed to limit hardware brute-forcing. Check this article guys, very instructive: https://eprint.iacr.org/2016/104.pdf In the section 2: "Security Requirements of Password-Hashing Schemes".

From my understanding, the parallelism is here to enforce the hardware to use more resources, even if this is not speeding up the process.

So, activating the multi-threading is important in a way that you enforce someone trying to attack you to use more resources.

antelle commented 3 years ago

@glihm it depends how calculations are made: either we set a predefined complexity and then threading support should make it faster (that's how I assume it works), or complexity grows with increasing parallelism (then single-threaded implementation would take more time for higher parallelism values).

TomMettam commented 3 years ago

With the native implementation, increasing the amount of parallelism is intended to keep the time constant but increase complexity.

It's very strange to me that using a single thread with a larger parallelism value would not incur a larger time penalty. Perhaps the threading isn't being utilised correctly (so it's always using one thread)?

TomMettam commented 3 years ago

I will do some experiments and research on this topic.

antelle commented 3 years ago

increasing the amount of parallelism is intended to keep the time constant but increase complexity

Is it stated anywhere in docs? I see that it's the opposite: increasing parallelism allows implementations to calculate the hash using several threads, then how they make use of this possibility it's up to them. For example, in WASM we're not using threads at all, so the performance remains constant. The native tool uses them with different degree of success.

antelle commented 3 years ago

Found it in docs: image

So, it's like I describe it: it's number of chains that can be run. The complexity is exactly the same, however required work can be splitted.

antelle commented 3 years ago

@TomMettam I've pushed my threading experiments to this branch, it contains a working example that can run up to 8 threads, you can play with it if you want. Compile it, break it, love it, hate it, let me know if you manage to get anything interesting out of it. I've also uploaded it here if you just want to run it in your browser and compare performance: https://share.antelle.net/argon2-threads/, spoiler: it's slow 🐌 (≈2x) and big 🥴 (≈+40k).

antelle commented 3 years ago

I believe this is now resolved and we can close the issue. The branch is here, and I'll return to it once a year or if there's any news about threads in WASM.

quexten commented 1 year ago

@TomMettam I've pushed my threading experiments to this branch, it contains a working example that can run up to 8 threads, you can play with it if you want. Compile it, break it, love it, hate it, let me know if you manage to get anything interesting out of it. I've also uploaded it here if you just want to run it in your browser and compare performance: https://share.antelle.net/argon2-threads/, spoiler: it's slow snail (≈2x) and big woozy_face (≈+40k).

I realize this is not an active topic anymore, I just want to note that in my compiled test, the threaded version is significantly faster than the non-threaded version.

On latest master branch, vs threaded branch on latest emscripten docker image and using Firefox I get ~10 seconds, vs ~2.5 seconds on parallelism = 16, iterations = 10, memory = 1GiB on a 4 core, 8 thread system.

Nantris commented 1 year ago

Any chance of re-enabling threads?

@quexten how's testing been on your branch? Are you using it in any projects?

Nantris commented 1 year ago

Friendly bump.

@antelle is this project still maintained? I noticed there haven't been any commits in a while but sometimes that's normal.