coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
263 stars 60 forks source link

initramfs: support configuring network on DHCP-less platforms #111

Open bgilbert opened 5 years ago

bgilbert commented 5 years ago

On Fedora CoreOS, Ignition performs the first-boot configuration starting from a running vanilla image and fetching the configuration to apply. In some cases such configuration is coming from a network resource (e.g. a link-local metadata service, or a cloud bucket, or a cluster service like OpenShift MachineConfigServer). This works fine in most cases, specifically on all platforms where there is a working DHCP and NetworkManager is able to configure the initramfs before Ignition runs.

There are however a few platforms where a machine is expected to auto-configure its own network using some other hints (i.e. not via DHCP):

In Container Linux, Afterburn runs in the initramfs, queries the metadata service, and writes networkd units for use by the real root filesystem. (On Packet they're written into /etc on first boot, and on DO they're written into /run on every boot.). This however introduces a lot of other troubles for the normal Ignition flow, like https://github.com/coreos/bugs/issues/2205, and in general cannot work for cluster services like MachineConfigServer.

We'll need better functionality in Fedora CoreOS, so that NetworkManager in initramfs can properly configure the network in those cases, before Ignition runs. We should also account for transitioning such configuration to the real root (with a teardown in-between at the time of root-pivoting).

EDIT(lucab): reworded for clarity and expanded to reference all platforms where we have the same kind of troubles.

dustymabe commented 4 years ago

We could possibly start using nm-cloud-setup for this. See #320.

IIUC nm-cloud-setup talks to NetworkManager via dbus and dynamically applies configuration from cloud providers to the system. This would work for configuring the network in the real root. It currently would not work for applying the configuration in the initramfs (where Ignition runs) because NetworkManager runs in oneshot mode in the initramfs and talking to it via dbus is not an option. Getting NetworkManager to run in non oneshot mode in the initramfs (thus being able to be communicated with via dbus) is currently being worked on, so we'll hopefully get this in the future. The drawback is that currently if someone wants to grab things from remote resources in their Ignition config it won't work (this was a problem in CL and it was never solved).

Summary for short term strategy:

Longer term we'll hopefully be able to use nm-cloud-setup in the initramfs and get the full networking applied there as well.

We also need the glue code for Packet and DO to be written for nm-cloud-setup. Current implementations for AWS GCP and Azure are here: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/tree/master/clients/cloud-setup

lucab commented 4 years ago

There is also nm-initrd-generator which performs a similar lifting of external configuration details in initramfs. Specifically, it currently knows about DeviceTree and iBFT sources. See https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/tree/1.26.2/src/initrd.

cverna commented 4 years ago

@thom311 would you be able to help us with this problem and weigh in on the design with your NM expertise ?

thom311 commented 4 years ago

Dusty is correct that nm-cloud-setup in the current form cannot run in initrd because it lacks D-Bus. @bengal is currently working on having D-Bus in initrd and running NetworkManager as a systemd service (contrary to have it spawned by dracut). In our opinion, D-Bus is very valuable and we would rather put effort into having D-Bus available everywhere, than trying to come up with solutions that work without D-Bus. I am optimistic that this will work out (eventually).

DigitalOcean does not provide DHCP for official images (#71 (comment)), network configuration is provided by an HTTP service on a link-local address.

nm-cloud-setup (which currently only supports EC2, AWS, GCP) works the way that when it runs, it tries to fetch the network configuration from the HTTP server. That implies, that the HTTP server is already reachable. With EC2/AWS/GCP that works because NetworkManager will automatically activate a profile that has DHCP enabled. What nm-cloud-setup then does, is to fetch secondary IP addresses and configure some routes.

With Digital Ocean, that current approach would not work directly (because it would require something to first activate a profile with (IPv4?) link local addressing enabled.

On the other hand, there is also nm-initrd-generator which runs in initrd and parses the dracut kernel command line and pre-generates profiles (which NetworkManager then activates). Of course, there could be any other similar generator tool. While we want that our nm-initrd-generator covers many useful scenarios, it does nothing that couldn't be done by a out-of-tree generator tool.

I think we can solve the issues for the different environments, and it could be a combination of nm-initrd-generator and nm-cloud-setup. But it also seems that the environments are different enough, that we should look at each one by one.