QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

unprivilege the CPU's random number generator (RDRAND) / set kernel parameter "`random.trust_cpu=off`" #6941

Open adrelanos opened 3 years ago

adrelanos commented 3 years ago

Originally brought up by me in https://github.com/QubesOS/qubes-issues/issues/6174#issuecomment-936180012

[0.048xxx] random: crng done (trusting CPU's manufacturer)

This! I've just rechecked the failed log, and I don't see trusting CPU's manufacturer part there. And indeed that CPU does not support RDRAND. This means, the extreme issue I see, applies only to quite old systems (and hopefully does not affect majority of our users - even good old x230 already has RDRAND). So, I'm lowering the priority. But it's still worth improving the situation.

Strongly discouraged to rely on RDRAND for security / entropy quality anyhow as per: https://www.whonix.org/wiki/Dev/Entropy#RDRAND

@marmarek https://github.com/QubesOS/qubes-issues/issues/6174#issuecomment-936226779:

Strongly discouraged to rely on RDRAND for security / entropy quality anyhow as per:

In context of this issue, it is not a problem, because stubdomain does not use RNG for any security critical task. There is not crypto involved etc. One could argue it may make ASLR for qemu less effective, but we don't consider qemu trusted, so it is not a huge deal (and remember the RDRAND issues are still very hypothetical - see below).

In a broader context of RDRAND, I don't think we should worry about backdoors there. Or rather: if you consider intentional backdoors in your CPU a valid threat, throw away that CPU. There is no really a difference how such hypothetical backdoor could work - whether that would be predictable RDRAND, reacting to some magic values to any other instruction, or anything else. We could worry about its effectiveness - not intentional bugs, which indeed is hard to reason about, since its being opaque.

Seems like I need to make a better argument.

Quote https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

random.trust_cpu={on,off}

[KNL] Enable or disable trusting the use of the CPU's random number generator (if available) to fully seed the kernel's CRNG. Default is controlled by CONFIG_RANDOM_TRUST_CPU.

The name of the kernel parameter random.trust_cpu is a bit non-ideal. There is no need to invoke big words such as "trust" or "backdoor" for the sake of this argument. Not even trust or a backdoor is required for this being an issue. Even a bug that happened in past would justify this change.

Ars Technica reported, AMD shipped Ryzen 3000 with a serious microcode bug in its random number generator. Lennart Poettering (@poettering) summarized the issue nicely.

Finally, AMD admits it's their fault, and they are preparing a BIOS update to fix RDRAND. You probably should avoid running a CONFIG_RANDOM_TRUST_CPU=y Linux kernel (Fedora) on a Ryzen system without that BIOS update, or all crypto keys generated are not as random as you hope.

That bug that gladly was discovered and publicized by a white hat. Due to the large amount of different CPU models, different batches it's not a good idea to rely on white hats to swiftly report it.

Or this other bug Kernel bug report from 2014, rdrand instruction fails after resume on AMD family 22 CPU.

"D. J. Bernstein isn't a fan of RDRAND either." In the same mailing list thread someone else posted:

On https://spideroak.com/browse/share/UTwente/RNG/Tests/NIST-STS/ you can find the results of randomness tests of several random generators including RDRAND.

In the document No_of_failures_calculation.txt you can find the used testing method and the test results.

The actual number of failed tests of RDRAND deviates more then 4 sigma from the expected number of failed tests.

The used software can also be downloaded from the same link so these tests can be reproduced.

As you also can see the XOR_SHIFT PRNG and the Picoquant PQRNG150 TRNG pass the tests with a number of failed tets within the 3 sigma deviation so the tests seem to work fine.

I didn't verify the latter but for my part I've seen enough.

random.trust_cpu=on means that RDRAND has a privileged position within Linux entropy gathering process.

random.trust_cpu=off makes it only a "normal" ("unprivileged") source of entropy among other sources (such as keyboard, mouse, CPU jitter, and the usual).


Current kernel entropy sources in Qubes are:

Suggested kernel entropy sources:


random.trust_cpu=on advantages:

random.trust_cpu=off advantages:


security-misc does it. (#1885)

marmarek commented 3 years ago

random.trust_cpu=on advantages:

* Perhaps negligibly faster boot of dom0?

It doesn't matter that much for dom0 (which has a lot more entropy sources). It matters a lot for VMs. It speedup VM boot quite significantly (besides #6174), all the things that happens before startup script can handle the seed from dom0 - for example generating UUID by mkswap and mkfs, glibc's implementation of quick sort etc. Those are all relevant for fast VM startup, and not really relevant for overall security. Some of them we could probably even hardcode to a static value (like, the swap UUID), but unlikely we'll manage to cover all the places.

An alternative could be seeding the kernel rng from dom0 earlier (maybe using some bootloader protocol?), but until that happens, the impact is significantly bigger than "negligibly faster boot of dom0".

adrelanos commented 3 years ago

An alternative could be seeding the kernel rng from dom0 earlier (maybe using some bootloader protocol?)

Potential options:

3hhh commented 3 years ago

As already discussed in #673: With kernel 5.6 or higher /dev/random will no longer block and thus not cause any delays anymore. That's true for both dom0 and VMs.

Until then at least the kernel devs consider the blocking /dev/random behaviour not buggy, but rather whatever application pulls too much data from it during the boot process.

So disabling random.trust_cpu is less likely to have impact with kernel 5.6+. Until then I kind of agree with @marmarek 's stance "If you don't trust the CPU, you lost anyway." However hiding bugs in the CPU RNG is likely easier than elsewhere (and it likely affects everything security relevant), i.e. I'd lean towards disabling it afterwards - even though it is just one entropy source out of potentially many that add to the kernel entropy.

EDIT: By "disabling" I meant random.trust_cpu=off, which essentially disables rdrand for the kernel.

adrelanos commented 3 years ago

I'd lean towards disabling it afterwards - even though it is just one entropy source out of potentially many that add to the kernel entropy.

This ticket is arguing for "unprivilege" RDRAND. Not "disable" RDRAND. There might be a case for "disable" but I am not sure it's what you meant?

(Mixing even fully compromised entropy sources is considered secure in the current Linux kernel implementation. Though, D. J. Bernstein disagrees with that: https://blog.cr.yp.to/20140205-entropy.html)

I am not sure if one wanted to argue for "disable" that should be a separate ticket since it seems there is less resistance for "unprivilege", which would be progress, than for "disable".

poettering commented 3 years ago

Not sure why I was CC'ed here, but scanning this I just wanted to mention that this already exists:

https://www.freedesktop.org/software/systemd/man/kernel-command-line.html#systemd.random-seed=

3hhh commented 3 years ago

Mixing even fully compromised entropy sources is considered secure in the current Linux kernel implementation. Though, D. J. Bernstein disagrees with that: https://blog.cr.yp.to/20140205-entropy.html

Bernstein assumes that you have entropy sources that you trust and some that are less trustworthy. If that's true, his statement on "stick with the single one you trust and ditch all other input" is correct (and you only need 256 bits or so exactly once).

However the Linux guys assume that you don't want to ultimately trust any of the entropy sources available to you (or are too uninformed to make the decision) and thus live with a few potential attacks.

I believe the latter is a more realisitic view atm (Linux runs on many "suboptimal" devices). If you build your own hardware RNG and use that, Bernstein's view is more accurate.