Define an API to allow pluggable custom Secure RNG implementations

antoinevg commented 2 years ago

Summary

At present we only provide secure random number generation on std Rust targets.
For embedded use we are falling back to an insecure const-seeded software implementation.
Embedded developers using Ockam will need Ockam-provided implementations for the secure hardware random generation capabilities of supported boards.
For unsupported boards developers will also need to be able to implement a driver for the hardware they are using themselves.

Implementation Constraints

Developers need to be able to specify a custom RNG implementation on std and no_std platforms.
Developers should not have to explicitly specify a default RNG implementation on std platforms.
Existing code usage, examples and behavior for std platforms needs to stay the same.

Ecosystem Support

Traits to support random number generation can be found in the rand_core crate.
An example of using those traits to implement a RNG HAL can be found in the stn32wlxx crate.

Related Issues:

150

mrinalwadhwa commented 2 years ago

My hunch is a vault should be initialized with a random number generator.

In pseudocode:

let random_number_generator = RandomNumberGenerator::create()
let vault = Vault::create(random_number_generator)

Where Vault::create accepts a trait.

thomcc commented 2 years ago

getrandom, which (the randomness library used by rand to get system entropy) actually has a system for providing custom randomness backends for platforms without better options https://github.com/rust-random/getrandom/blob/master/src/custom.rs.

Regardless of whether or not we use their hook[^1], the techique it uses (where you define a extern "C"[^extrust] function which needs to be provided by exactly once in the final build (or you get a linker error, unless it's never called) works relatively well. You see the same pattern in the critical-section crate, which I believe is used somewhat on embedded: https://github.com/embassy-rs/critical-section/blob/main/src/lib.rs#L5-L23, and it's essentially the same system by which the global_allocator functions and panic hooks are provided[^others].

One downside with this is that it gives poor error messages if misused, but it would only be used in expert cases -- only advanced users on embedded systems should be swapping out their RNGs. The upside is that our APIs do not gain additional complexity for passing around the vault, when that complexity is not needed except for a small set of cases (embedded use). Another upside for embedded use is that it has basically no overhead beyond a function call (and with a -Clto build even that can be eliminated).

Why not bundle it with the Vault?

Use of the Vault is an option, but I think the way we split the vault traits is going to cause a lot of trouble in the future. Adding new vault traits doesn't meaningfully make this worse (we have enough already that it's not a huge difference), but the current state looks like it will probably need a bunch of changes to support the cases we seem to want to support[^2], so I'm not 100% convinced it's a solution.

Regardless, it's reasonable to expect there'd be some way of attaching it to the vault, so if we liked that design the details could be postponed. I think it's still not ideal to do it that way -- this means we now must organize that the vault to be available everywhere randomness is needed, which may be... challenging. It may lead to us deciding not to use randomness in cases where it would be nice (but isn't required)[^addrrand], and it may lead to us not to being able to implement traits like that we otherwise would[^default].

In other words, easy access to random numbers feels like a good property for us to keep for various reasons, even though strictly speaking we could come up with a design where it is is passed around explicitly. I suppose another way of putting this is that I agree with @antoinevg's assessment that ideally "Existing code usage, examples and behavior for std platforms needs to stay the same.", which I think this would probably require, (unless we're willing to split our API into separate std/no_std sections, which I would prefer we avoid).

P.S. I remember reading that most hardware RNG chips for embedded systems are not great at providing high quality entropy, and should just be used to see something a sofware PRNG (like ChaCha20 or some other CSPRNG) that stretches what they have (and to perhaps periodically reseed w/ something like prng.state = sha256(prng.state || hwrng.entropy)). I am not an expert here, though.

[^addrrand]: For example, we use random addresses all over the place. I don't believe this should be strictly required for security, but I suspect it would be... more painful if not everything needed access to the vault.

[^extrust]: Actually, they could use extern "Rust" for this, and I would.

[^1]: And there are reasons not to, like it's only possible to replace the RNG if getrandom fails to recognize the target, which can be inflexible.

[^2]: The notion that we'll have these traits split the way they are, and possibly some vaults will provide them and others won't, that basically doesn't work in practice -- everybody will have to provide all the traits as it currently is, and as soon as we look into a subset it will fall apart somewhat. I'm looking into a design for this, but don't have it fully thought out yet.

[^others]: There are other cases for us where this is appropriate too, but it should IMO not be used too aggressively -- it can lead to pretty bad experience when a user fails to correctly set it up, since it's not well integrated into the language yet (the feature for it would be existential types, basically, which has been RFCed a few times but never successfully).

IMO the long and short of this is that if we start using that technique, it should only need to be used for cases where std needs no configuring. And even then we should be somewhat careful. (That said, I'd be lying if this wasn't a design I was toying with for async replacement).

[^default]: For example, we wouldn't be able to provide Default impl for something that needs randomness -- Default::default() takes no arguments, so it would be difficult to use to initialize something that had some random state inside it, which could make our API less idiomatic (and perhaps more verbose).

thomcc commented 2 years ago

P.S. I remember reading that most hardware RNG chips for embedded systems are not great at providing high quality entropy, and should just be used to see something a sofware PRNG (like ChaCha20 or some other CSPRNG) that stretches what they have (and to perhaps periodically reseed w/ something like prng.state = sha256(prng.state || hwrng.entropy)). I am not an expert here, though.

Complicating the matter is that for persistent keys, it's possible that even hardware RNGs on beefier machines are insufficient without extra processing. This is true even for RDRAND apparently, according to https://www.intel.com/content/www/us/en/developer/articles/guide/intel-digital-random-number-generator-drng-software-implementation-guide.html.

This has a couple procedures in it, but an interesting one is how they extract entropy by filling a big[^1] buffer with data from the hardware RNG, encrypt that data with AES in CBC-MAC mode, and use the final block for the key. The relevant sample code is https://gist.github.com/thomcc/5f115d9e1951eb538249a926b0b82e1c#file-gistfile1-c-L48-L165 for those that don't want to download anything.

I wonder if we need to consider different RNG categories, e.g. something that provides true randomness vs something that implements a CSPRNG in hardware (where the latter is fine if it provides sufficient internal state, since if you have 256 bits of entropy in a CSPRNG it should be enough to last until the sun goes out).

[^1]: The size of which seems tricky to determine -- theirs is 512 * 16 bytes becuae they're generating a 16 byte key and the hardware RNG reseeds itself after 512 bytes of entropy? Or something. I'd have to study it more closely

build-trust / ockam