FiloSottile / age

A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.
https://age-encryption.org
BSD 3-Clause "New" or "Revised" License
17.26k stars 506 forks source link

Make age parallel #109

Open paulmillr opened 4 years ago

paulmillr commented 4 years ago

If you encrypt files on a machine with tons of RAM and cores, age isn't any faster versus some basic slow PC.

I think it would be great to utilize resources when they're available.

Tried this on Linux via piping and via -i -o — seeing tiny load of one core.

RKinsey commented 4 years ago

This is that it isn't feasible to do that without overhauling Go's cryptograhpy libraries (and might be unsafe, I don't know enough about goroutine security to say for sure).

The only functions in age that actually handle the plaintext are EncryptOAEP/DecryptOAEP from crypto/rsa and Seal/Open from x/crypto/chacha20poly1305, neither of which are parallel. Both could be parallelized, but RSA generally hasn't because it needs a parallel-friendly modular exponentiation function. ChaCha is fairly easy to parallelize, but Go's implementation is handwritten assembly using vector instructions when available (unless you're using a purego build, gccgo, or an uncommon CPU architecture). I have a feeling that probably outperforms a goroutine version, but maybe not.

xorhash commented 4 years ago

@RKinsey I'm not sure if this argument actually holds. internal/stream/stream.go seems to read and write in chunks of 64 KiB (plus 16 bytes of Poly1305 tag for each encrypted chunk). Therefore, there's parallelization potential there by queueing up the encryption/decryption of chunks (or multiples of chunks) between cores. Orchestrating the whole thing so that there's no bottleneck when reading or writing is another story though.

joonas-fi commented 4 years ago

Yeah @RKinsey was talking about the key-wrapping phase. The actual symmetric stream encryption is where the bulk of Age's work happens (at least on larger file sizes) and it looks like it could be parallelizable.

The stream is divided into fixed-size chunks of 64 kB, and each chunk uses the same encryption key but of course a different nonce. The nonce is calculated based on the chunk number. It's a seekable stream and thus theoretically easily parallelizable. Though practically the code will be more complex than what currently is - so it'd need pretty good testing suite.

Tronic commented 3 years ago

Just running chacha20-poly1305 in parallel for a few blocks easily more than doubles the speed. My own tool is written in Python and does 2.2 GB/s encryption and decryption (using 4 threads for chacha, otherwise single-threaded). It is a shame that the crypto libraries don't offer threaded implementations of these algorithms.

This is on a machine where age does 1 GB/s and rage only 400 MB/s.

paulmillr commented 3 years ago

@Tronic are you using the latest rage? The speed difference should be minimal right now

Tronic commented 3 years ago

@paulmillr rage 0.7.0 on Windows.