cmd/age: Improve decision on when to buffer output

codesoap commented 3 years ago

Previously output was buffered when --armor was set and the output was a TTY. It was not checked whether the input was a TTY, decryption was not taken into account and binary output was not buffered either. This lead to some problems:

1.: dd if=/dev/urandom bs=1M count=300 | age -r age1... -a caused fatal error: runtime: out of memory (increase count if it doesn't for you). No one will encrypt this much to stdout, but buffering is unnecessary here.

2.: Output was not buffered for age -d (when pasting some armored input into the TTY as input). Admittedly this only happens when copy and pasting a lot of input (ca. 70k bytes for me), but it still seems wrong. Here is how I can make the problem visible:

yes 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.' \
| dd bs=1k count=70 \
| age -a -r age1... \
| xclip -i

# Paste with middle mouse button here:
age -d -i mykey

3.: Output was not buffered when encrypting and forcing binary output with age -r age1... -o -. Of course this is also something most people wouldn't do.

str4d commented 3 years ago

Admittedly this only happens when copy and pasting a lot of input (ca. 70k bytes for me), but it still seems wrong.

This is because age chunks plaintext at 64kiB:

For files smaller than 64kiB, it looks like the output is buffered because it is, by necessity (so that unauthenticated plaintext is never released).
For large files you'd start to get decrypted output after reading in 64kiB + header size (hence you observing it around 70kiB).

Output is always read in 64kiB chunks, so if there's going to be any buffering at all at decryption time, it should be a multiple of 64kiB. As for what that should be, it would be useful to know what kinds / sizes of workloads require this buffering; we obviously can't buffer every file in its entirety.

codesoap commented 3 years ago

This is because age chunks plaintext at 64kiB

Sounds plausible to me.

if there's going to be any buffering at all at decryption time, it should be a multiple of 64kiB

I'm afraid I don't yet understand why it should be a multiple of 64kiB. bytes.Buffer simply grows as needed. Why should I limit or complicate this? The buffer is supposed to buffer all output until the full input has been read.

As for what that should be, it would be useful to know what kinds / sizes of workloads require this buffering; we obviously can't buffer every file in its entirety.

I don't think there are many real use cases and the clipboard probably also has its limit, but using an explicit buffer seems cleaner to me than depending on some (finite) default buffer, which probably isn't guaranteed to be stable. Or is this 64kiB buffer a guarantee (if so: could you point me to it's documentation?)?

FiloSottile / age

cmd/age: Improve decision on when to buffer output #176