coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

Initramfs network configuration #460

Closed jlebon closed 3 years ago

jlebon commented 4 years ago

We want to rework networking in the initramfs so that:

  1. we allow conditional networking (https://github.com/coreos/fedora-coreos-tracker/issues/443)
  2. we allow platform-specific network configs to be injected (e.g. https://github.com/coreos/afterburn/pull/379)
  3. we enable the live ISO + coreos-installer network config path (https://github.com/coreos/coreos-installer/issues/205)

Chatted with @dustymabe and @lucab about this, and the proposal we came up with is the following:

So the end state would look something like this:

cgwalters commented 4 years ago

I'm overall fine with this.

However, one thing I'd like to investigate at some point is whether we actually need to default to DHCP on platforms where metadata comes from the link local address - which is almost all the important cloud providers.

If all we need to do in the initramfs is bring up "the" NIC enough to fetch that, that would allow us to uniformly support encoding network config in Ignition.

dustymabe commented 4 years ago

However, one thing I'd like to investigate at some point is whether we actually need to default to DHCP on platforms where metadata comes from the link local address - which is almost all the important cloud providers.

As part of the proposal it includes:

2. we allow platform-specific network configs to be injected (e.g. https://github.com/coreos/afterburn/pull/379)

Which means each platform can have it's own default so we could do something clever.

If all we need to do in the initramfs is bring up "the" NIC enough to fetch that, that would allow us to uniformly support encoding network config in Ignition.

Right now I don't think we're currently differentiating between initramfs networking and real root networking. i.e. link local may be enough for initramfs networking to grab an ignition config from the provider but not for real root. Though, another thing to think about is that your ignition config could have remote references which would need more than link local networking.

Overall I think it's just safer to bring it all the way up to the point you can resolve hostnames and curl.

jlebon commented 4 years ago

Which means each platform can have it's own default so we could do something clever.

It's a bit trickier than that though, because e.g. Ignition might be able to fetch the config over link-local, but still needs full networking to fetch remote resources specified in the config. And we only have a single synchronization point for yes/no to full networking.

So I think this is something like: on supported platforms, we always bring up networking enough for link-local. If Ignition needs full networking, it can request it. This is what this bit is about:

Whether to only attempt fetches which can be performed offline. This currently only includes the "data" scheme. Other schemes will result in ErrNeedNet. In the future, we can improve on this by dropping this and just making sure that we canonicalize all "insufficient network"-related errors to ErrNeedNet. That way, distro integrators could distinguish between "partial" and full network bring-up.

dustymabe commented 4 years ago

I had a chat with some of the openstack provisioning folks today (aka OpenShift IPI). Out of that I had a specific question:

jlebon commented 4 years ago

Will the work for this issue mean that we won't attempt to bring up the network if an openstack config drive is provided that has no remote references?

Short answer: yes.

Long answer: yes, but if possible, I would strongly advise using the metal image instead and the new coreos-installer to inject the Ignition config. While all the images are just one transform step away from each other right now, that may not always be the case (see e.g. https://github.com/coreos/fedora-coreos-config/pull/407). I think we want to reserve the right to change that.

Also, the OpenStack Ignition provider in specific is not great because it has to query for both config drives and the metadata server in parallel, and we've established that in general it's not a good idea to have this sort of timeout (see a lot of discussions around this in https://github.com/coreos/ignition/issues/928).

Your question though made me realize that I need to adapt the Ignition OpenStack provider code in light of the fetch-offline work so that we don't just error out if the metadata server fails, but signal neednet.

jlebon commented 3 years ago

This is done now:

See also https://github.com/coreos/fedora-coreos-tracker/pull/691.