flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
653 stars 27 forks source link

FlatCar Beta 3913.1.0 with systemd 255 enables DHCP rapid commit by default #1438

Open daMupfel opened 2 months ago

daMupfel commented 2 months ago

Description

The new Beta FlatCar with version 3913.1.0 updated systemd to version 255. With this new version comes support for DHCP RapidCommit which seems to be enabled by default:

RapidCommit=

    Takes a boolean. The DHCPv4 client can obtain configuration parameters from a DHCPv4 server through a rapid two-message exchange (discover and ack). When the rapid commit option is set by both the DHCPv4 client and the DHCPv4 server, the two-message exchange is used. Otherwise, the four-message exchange (discover, offer, request, and ack) is used. The two-message exchange provides faster client configuration. See [RFC 4039](https://tools.ietf.org/html/rfc4039) for details. Defaults to true when Anonymize=no and neither AllowList= nor DenyList= is specified, and false otherwise.

    Added in version 255.

Our cloud provider (CloudSigma) seems to have a faulty implementation of DHCPv4 rapid commit which means that we are no longer getting an IP address.

This can be fixed (for existing servers) by copying the default config from /usr/lib/systemd/network/zz-default.network as an own config and adapting the DHCPv4 section as follows:

[DHCPv4]
RoutesToDNS=false
RapidCommit=false

Impact

Not getting an IP address. Because the CloudInit process for CloudSigma requires an assigned lease this also means that the whole setup doesn't work anymore.

Environment and steps to reproduce

  1. Upload current beta FlatCar CloudSigma vendor image to CloudSigma
  2. Create a new machine
  3. No public IP is assigned and the CloudInit process never runs

Expected behavior

Server correctly setup with IP and CloudInit config.

Additional information

We are also in discussions with CloudSigma in order to fix their DHCP implementation. Not sure when and how this will go though.

This is not really a bug on Flatcars side but rather a break for us because the network config is now different with the new version.

The question is how this could be fixed (if you are open to do it on the FlatCar side). I currently see the following options:

I would like to get some feedback for this and probably can provide a PR if you would be fine with one of the proposed solutions :).

jepio commented 2 months ago

Add a custom network config file to the vendored CloudSigma image

this would definitely be a good idea if the default does not cause widespread problems for other platforms

t-lo commented 2 months ago

@jepio if added only to oem-cloudsigma it shouldn't affect other platforms, should it? And it potentially affects all CloudSigma deployments the way I read the summary.

@daMupfel I would argue that implementing this should be done as an OEM sysext so the change is also distributed to existing nodes when these update (@pothos please keep me honest). Using an OEM sysext would also allow to change the config with future updates if required. As sysexts cover /usr, the config should go to /usr/lib/systemd/network/. This is slightly (but only slightly) more complicated than just dropping a config file to the oem-cloudsigma provider. The biggest challenge is to introduce OEM sysext to the cloudsigma image as this image is currently not using OEM sysexts afaict. But that shouldn't keep you from working on a PR, OEM sysexts are used for most other images. The concept should be easily portable to cloudsigma.

pothos commented 2 months ago

I think the OEM sysext might get loaded too late? For most clouds the small network config files are part of the base image because they need to be in bootengine and in init.

t-lo commented 2 months ago

Hmmm, good point, re-reading the summary it states that bootstrap configuration fails, so this is required in the initrd. No sysext then.

daMupfel commented 2 months ago

Hi, thanks for the feedback so far :).

When adding it to the oem image it won't be updated on existing installations (the oem partition seems to keep the state of the original install), is that correct? At least that was my observation so far. If so, are there any options to make this work for existing installations which update?

daMupfel commented 2 weeks ago

I added a PR regarding this issue in flatcar/scripts. This probably won't fix existing installations (during update) but we can manually fix those in our system quite easily. Please let me now if you think this is a good solution.