cgwalters commented 6 years ago

I was looking at an OpenShift install recently and noticed in the consoles that random: crng done was quite late - ~60s after boot. Now in this case I think it's a bug that the Terraform provider doesn't provide a virtio-rand device.

There was an issue I thought was against Ignition recently but I can't find it about not having entropy before it tries to speak https:// early in the initramfs.

Anyways, here's the proposal; we add a security/entropy key that is a string, and Ignition would do RNDADDENTROPY on it.

With the OpenShift installer, since the Ignition configs are generated from a client machine at first, and then later by a machine config operator - we're in a position where we can propagate entropy from the client all the way to nodes.

Now personally, I think it's broken for the hypervisor to not provide a random seed. This issue will also sort of solve itself over time as everyone upgrades to hardware with RDRAND and people enable the kernel config option to trust it but even in that world, it's not going to hurt to add further additional entropy at system bootstrap time.

ajeddeloh commented 6 years ago

cgwalters commented 6 years ago

see also #645

Yes! Thanks, that's the one I was thinking of but for some reason failing to find.

bgilbert commented 6 years ago

I'm not optimistic that the issue will completely sort itself out; distros won't necessarily enable RANDOM_TRUST_CPU and platforms may not provide RDRAND access. Allowing Ignition configs to provide system entropy seems like a pretty large footgun though. Many Ignition configs are reused for multiple machines, passed through cloud providers, or passed over unencrypted connections. We can only securely provide entropy if it's generated on demand for each machine and passed to Ignition over HTTPS -- but HTTPS is off-limits because we need entropy to use it.

I'd favor having Ignition privately use RDRAND for TLS entropy, as described in #645, over the approach here. It doesn't work universally though, e.g. on GCE.

cgwalters commented 6 years ago

When we're talking about TLS, another issue is time; hence the recent creation of roughtime.

OK so the more I think about this, there are two cases:

Bare metal install
Cloud (or more generally "pre-provisioned")

In the bare metal case, there's no reason we can't take entropy from the running system and provide it to the installed one - this is the default behavior of Anaconda actually, that we need to undo in current c-a.

For the dd install case, we'd need to mount the target FS and write /var/lib/systemd/random-seed before rebooting, so it wouldn't quite be dd anymore.

Now, to the cloud case:

I think basically we need to trust the hypervisor. What's the value of TLS when talking to e.g. the EC2 metadata server? I seriously doubt that traffic has a chance of being intercepted.

And particularly in the qemu case where we pull the config out of read-only data provided directly by the hypervisor...reading an entropy key wouldn't involve any TLS at all right?

cgwalters commented 6 years ago

BTW https://github.com/systemd/systemd/pull/4513 is related here too.

And one other thing I was thinking about here is that today, systemd needs some random data for its internal hash tables at least. And that happens in pid 1 I believe even in the initramfs. So that's long before Ignition or systemd-load-random-seed for that matter. It looks like today it uses GRND_NONBLOCK but still.

Perhaps what we need to do is move the seed to /boot/random-seed and have it loaded by GRUB and passed on the kernel cmdline or so.

bgilbert commented 6 years ago

For the dd install case, we'd need to mount the target FS and write /var/lib/systemd/random-seed before rebooting, so it wouldn't quite be dd anymore.

Ignition needs the entropy when fetching the config in the disks stage, but filesystems aren't mounted until before the files stage. That's surmountable, but it also makes metal different from cloud, which is not ideal. On bare metal I think we can probably assume reasonably current hardware where RDRAND will exist.

I think basically we need to trust the hypervisor. What's the value of TLS when talking to e.g. the EC2 metadata server? I seriously doubt that traffic has a chance of being intercepted.

And particularly in the qemu case where we pull the config out of read-only data provided directly by the hypervisor...reading an entropy key wouldn't involve any TLS at all right?

Either case assumes that the entropy is stored persistently in the instance metadata, but that's not especially secure. For example, on EC2: by default any process on an instance can fetch its userdata, including inside a container, and userdata also available via the EC2 API.

I think https://github.com/coreos/ignition/issues/645#issuecomment-433435477 is probably the right approach, and it's also what systemd does. I'm really not in favor of an Ignition option that's easy to misuse with nigh-undetectable security consequences.

bgilbert commented 5 years ago

670 landed, so closing as moot.

cgwalters commented 5 years ago

Reopening since I'd still like to consider this. Today in the machine-config-operator we have a "pointer" ignition config which just includes a pair of (CA, real Ignition url). And today, the MCS always dynamically generates that second (pointed-to) config. We're in a position to provide strong entropy to nodes early in the boot process.

I'm really not in favor of an Ignition option that's easy to misuse with nigh-undetectable security consequences.

I think if we document that users shouldn't use it with static configurations, that'd probably be enough. I mean, there's lots of other dangerous things one can do in Ignition with systemd units too.

cgwalters commented 5 years ago

That said of course, https://lwn.net/Articles/802360/ will eventually make this a lot less bad.

And for sure in the MCO we could instead do something like write /var/lib/systemd/random-seed via Ignition and then trigger loading it as strong entropy once we're in the real root if we detect it's first boot. But, it seems nicer if Ignition had explicit support for this, as we'd have the entropy loaded before switching root.

bgilbert commented 5 years ago

I think if we document that users shouldn't use it with static configurations, that'd probably be enough. I mean, there's lots of other dangerous things one can do in Ignition with systemd units too.

With any other Ignition feature, there's a straightforward way to use it which is correct, and the incorrect ways often blow up immediately. This feature can only be used correctly in a narrow set of circumstances, adds non-obvious complications to the cluster's threat model, and is subtly dangerous if used incorrectly.

bgilbert commented 3 years ago

This issue hasn't seen any traffic for a while, and I still think it's too obscure and dangerous to implement. I'll go ahead and close this out.

coreos / ignition

Add `entropy` key #653

670 landed, so closing as moot.