Consider replacing /dev/urandom by a system/library call when possible

LoupVaillant commented 1 year ago

Spotted here.

The problem with /dev/urandom is that it's a file. Many errors specific to reading files could happen, up to and including failing to fill the buffer. Even if we assume the return code is enough to detect all errors (which may be a safe assumption, I won't bet my hat), returning an error code is the wrong API in my opinion: what's the user to do when faced with the failure something as basic as the RNG? In practice I believe the only choice is to panic with a suitable error message.

What I'm saying is, what you really want is an interface like this:

void fillrandom(uint8_t *buf, size_t len);

_(Note my preference for size_t when it comes down to buffer sizes. Others would use ssize_t instead. The point is to use a type that can describe the whole range of possible sizes, and those are it. Though of course in practice you'll rarely do that on huge buffers, so if you're crystal clear that the admissible range is small, int or an unsigned can be enough.)_

That way users don't have to, and cannot forget to, check the return value. fillrandom() will either successfully complete, or panic. Now panicking is not nice, so we want to minimise that ever happening. On non-Windows systems my answer would be:

If arc4random_buf(3) is available, use it: it does exactly what we want and virtually never crashes.
On Linux systems unlucky enough to lack arc4random_buf(3), try getrandom(2) instead. This one can fail, so you may want to read the exact error and panic or retry accordingly.
If your system is lacking any sort of system call to get randomness, either use /dev/urandom or (and I would consider it a valid approach in 2023) refuse to compile altogether: properly handling file related errors and retries is no picnic. @fscoto once showed me a soberingly long example.

LoupVaillant commented 1 year ago

Just noticed that the error handling in the current version of fillrandom() is more lax than what you do when you read a regular file. If for instance it gives you only a partial read you will get an undetected half-filled buffer, and all the nasty consequences that go with it.

Here's a proposed quick (untested) fix that matches your policy in kr.c:

int
fillrand(void *buf, size_t len)        // `size_t` matches `fread(3)` better
{
    FILE *f = fopen("/dev/urandom", "rb");
    if (!f) return 1;
    size_t r = fread(buf, len, 1, f);  // Retain size
    int error = r != len || ferror(f); // All possible errors
    fclose(f);                         // Maybe give up on checking this one?
    return error;                      // Return the trustworthy error
}

ewd340 commented 1 year ago

Thank you @LoupVaillant for taking the time to look into it. I really appreciate it!

We actually discussed entropy in issue 218 of Monocypher with some very valuable input from you and @fscoto. I learnt a lot from you guys.

Yes, I agree that using /dev/urandom should be a last resort, especially on systems that already have better to offer such as arc4random_buf(3) or even getrandom(2).

I actually borrowed this function from @skeeto who uses it in his monocrypt project and in a similar fashion in his enchive project. If I understood well, random(4) states that reading up to 256 bytes will return as much as asked for, and won't be interrupted by a signal handler. Now, even if we (in our case) do not ask for much, I do agree that if we can avoid using a file (/dev/urandom) altogether, that will indeed be better. So, IMHO, it would be better to check for the availability of arc4random_buf(3), then getrandom(2), then eventually fallback to /dev/urandom as a last resort as you rightly suggest.

Now, about the signature, I actually thought of rewriting this function to return an enum error instead of its current int return value which I kind of translate to an error code here, here, and here. Why would I opt for this? Two reasons actually:

Unify return types of function that do stuff and may eventually fail (arc4random_buf(3) won't, but it is not available everywhere). Sure, users won't have to remember the return type were it a void, but then again, the only user in this case is this app which opts for a unified convention of returning an error code.
Instead of a panic on the spot, IMHO it would be better to bail out to a central point of the app (the bail label) where we can do some cleaning before exiting.

I also share your point of view about size_t as well, being the one used by fread(3). Your revised version is much better as it handles the cases where the read len is smaller than what is requested.

int
fillrand(void *buf, size_t len)        // `size_t` matches `fread(3)` better
{
    FILE *f = fopen("/dev/urandom", "rb");
    if (!f) return 1;
    size_t r = fread(buf, len, 1, f);  // Retain size
    int error = r != len || ferror(f); // All possible errors
    fclose(f);                         // Maybe give up on checking this one?
    return error;                      // Return the trustworthy error
}

I shall work on this as soon as possible. Again thanks a lot for your input and your work on Monocypher that made this project possible.

P.S. Regarding what the user can do when faced with failure as basic as the RNG. It is true, and the current error message that kr returns doesn't help either. Indeed, a message such as ("Could not fill a random buffer") is really obscure, and should be rewritten as well.

skeeto commented 1 year ago

Commenting since I wrote the original function.

only a partial read you will get an undetected half-filled buffer

That's already handled by the original, and your new version always reports an error when len > 1. On success, fread returns nmemb, the second size_t parameter. By using 1 as this argument, the fread return effectively becomes a "success" boolean and no further checks are necessary, not even ferror. You successfully got exactly as many bytes as you wanted, so there's no possible error that could matter to the application.

(I've recently realized this is a rather confusing aspect of fread. In February there was a Hacker News thread about one of my recent articles which mentioned fread, where everyone in the discussion was confused about this detail.)

In practice I believe the only choice is to panic with a suitable error message.

I generally agree, but this is in a thin platform shim that exists only to expose the platform entropy API through a uniform interface. Error printout would need to be duplicated in each implementation of this function — and you're suggesting adding even more such implementations. (I stuck to /dev/urandom so that just two implementations covers every OS from this century, even if less than perfectly optimal.) If panicking is important, then this function should be wrapped in the application where the panic can be done in a single place and with better context.

so if you're crystal clear that the admissible range is small, int or an unsigned can be enough

It's designed only for small requests, and even more, in practice len will be a compile-time constant (e.g. sizeof or a named constant). The latter is true both in the original context and in this program. Any modern compiler will give a compile-time diagnostic if there would be truncation.

Changing to size_t makes the Windows implementation more complicated because its interface takes an unsigned long (32-bits), not a size_t. It would either need to assert against large inputs or operate as a loop. Using int puts that burden on the caller, which in this program is no burden at all. (This is a situation I've also had to consider a number of times with WriteFile, which only accepts a 32-bit size even on 64-bit hosts.)

Though it wasn't the main reason for choosing int, I'm one of those people who prefers signed sizes — more so now than when I wrote that function — though I use ptrdiff_t rather than ssize_t. Unsigned sizes were one of the great early computing mistakes we're still paying for today, where an off-by-one into negatives counter-intuitively becomes a giant size rather than a negative that still compares intuitively. Unsigned types should generally be avoided outside of the few special cases where its properties are desirable (hashes, RNG, cryptography, octets, bit-oriented values). "Sizes shouldn't be negative" is an insufficient reason.

ewd340 commented 1 year ago

A pleasure to have you here, @skeeto! Thank you!

I reread fread(3) again and again, and you're absolutely right!

If an error occurs, or the end of the file is reached, the return value is a short item count (or zero).

It is talking about a short item count (the count is 1 in our case, shorter would mean 0). Simply said, what I got from it is that fread would read at most nmemb (that's its unit) items. Thus the original implementation either reads one item (sized len bytes) or zero items.

As for arc4random_buf(3), I recall having read a long exchange on the glibc mailing list, where it is said that: getrandom() and /dev/urandom are extremely fast and operate over per-cpu states locklessly. And that was backported to older versions of the kernel. There's also another discussion on lwn.

LoupVaillant commented 1 year ago

Oops, my bad for not reading fread(3) properly. I also understand the rationale for keeping the implementation as small as possible, even if reading a file is not necessarily ideal. I don't have a perfect answer there.

Same thing about returning an error or panicking on the spot. My programs tend to follow the convention that unrecoverable errors should panic if possible. This reduces the need for checking at every level. I've seen conventions that check at every level regardless of the recoverability of the error, and it tends to accrete quite a bit of boilerplate. Still, the BAIL pattern is a good one, and successfully reduces the error handling burden. No perfect answer here either, I guess we just have different preferences.

getrandom() and /dev/urandom are extremely fast and operate over per-cpu states locklessly.

Okay, that has the potential to squash most of my qualms about /dev/urandom. Reading those thread right now, thanks for the link.

LoupVaillant commented 1 year ago

~~Just read the beginning of the glibc thread, it's fucking terrifying, and would change my recommendations to something like:~~

~~If you're on a BSD system, use arc4random_buf(3). Not sure about MacOS though, I haven't studied it.~~
~~If you're on Linux avoid arc4random_buf(3) and use getrandom(2) instead.~~
~~If you're on Windows use it's thing.~~
~~If all else fails, maybe fallback to /dev/urandom. Whether you want that or not depends on your use case (I understand @skeeto's goal of easy portability).~~

@ewd340 do you recall how this thread ended? Did they at least committed to a secure arc4random(3) implementation? I can't recommend anything whose documentation suggests it may not be suitable for cryptographic use.

_(Edit: should have read the second link. I was scared for nothing. Long live arc4random_buf(3)!)_

One last thing: the present discussion convinced me to downgrade my appreciation of this issue down to a mere "nitpick", which would be okay to de-prioritize indefinitely — perhaps even close as "not worth fixing". Sorry for the noise, and keep up the good work.

ewd340 commented 1 year ago

Glad we're coming all to an agreement, and thank you both for taking the time to discuss this issue. It was not a noise, @LoupVaillant, au contraire! It was highly interesting and instructive.

So, for the time being, I'll just leave fillrand as it is. It is doing its job as optimally as one would want and IMHO there is no reason to worry if a read of /dev/urandom fails (if it ever fails) as long as we handle this by bailing out to a safe place and tell the user (in a better way than the current "Could not fill a random buffer" message).

I'll put this link to Myths about /dev/urandom that I discovered yesterday and that would —I hope— be interesting for a any future reader of this issue.

chkoreff commented 1 year ago

Yes, definitely not noise. I'm considering using getrandom(2) instead of /dev/urandom here:

https://github.com/chkoreff/Fexl/blob/master/src/crypto.c#L19

Thanks folks!

ewd340 / kr

Consider replacing /dev/urandom by a system/library call when possible #2