canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.83k stars 847 forks source link

Ubuntu: cloud-init.service order After=NetworkManager.service not possible with Before=sysinit.target #4101

Open ubuntu-server-builder opened 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #2015949

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2023-04-12T03:50:22.316150+00:00
date_fix_committed = None
date_fix_released = None
id = 2015949
importance = medium
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/2015949
milestone = None
owner = chad.smith
owner_name = Chad Smith
private = False
status = triaged
submitter = chad.smith
submitter_name = Chad Smith
tags = []
duplicates = []

Launchpad user Chad Smith(chad.smith) wrote on 2023-04-12T03:50:22.316150+00:00

For Ubuntu Desktop images which prefer NetworkManager as the primary network configuration service, provide a mechanism by which cloud-init.service can be ordered After=NetworkManager.service and/or NetworkManager-wait-online.service.

Use case: The Ubuntu desktop live installer ISO prefers using NetworkManager as the primary network backend and cloud-init must order After=NetworkManager.service in these cases to avoid DNS-related bugs during datasource discovery and downloading user-data such as LP: #2008952.

Issue: Upstream Ubuntu packaging of systemd cloud-init.service file declares ordering as After=systemd-networkd-wait-online.target[1] and Before=sysinit.target[2]. Adding an new After=NetworkManager.service creatd a systemd ordering cycle which results in cloud-init.service being kicked out of desired systemd boot target goals. The ordering cycle is due to NetworkManager.service After=dbus.socket and cloud-init.service declaring Before=sysinit.target being incompatible.

Fix Proposal: Short-term fix is released which provides an override for cloud-init.service in the livecd-rootfs project[3]

Mid-term need is to provide an environmental artifact or mechanism at systemd-generator timeframe to allow cloud-init.service to order After=NetworkManager.service and drop Before=sysinit.target for that use-case.

Since NetworkManager.service is After=sysinit.target due to After=dbus.service ordering, cloud-init.service would have to drop it's Before=sysinit.target declarations in order to avoid systemd ordering cycles punting cloud-init out of the boot target.

Long-term want: Ideally, we may want to see NetworkManager.service support for systemd ordering Before=sysinit.target, but that may involve NetworkManager growing the ability to plugin to dbus.service/socket/broker if dbus shows up later than NetworkManager.service. Upstream systemd-networkd made this shift to late-bind to dbus broker as discussed in LP: #1636912 which were eventually accepted for systemd-networkd.service[4][5].

But NetworkManager growing support for earlier boot before dbus.service is probably a longer term goal for NetworkManager than cloud-init.service allowing flexibility at systemd generator timeframe to prefer NetworkManager over networkd for certain images/environments.

[1] https://github.com/canonical/cloud-init/blob/main/systemd/cloud-init.service.tmpl#L11 [2] https://github.com/canonical/cloud-init/blob/main/systemd/cloud-init.service.tmpl#L33 [3] livecd-rootfs cloud-init.service overrides https://code.launchpad.net/~chad.smith/livecd-rootfs/+git/livecd-rootfs/+merge/439586 [4] functional changes allowing networkd to set hostname at some point after networkd start when dbus service shows up https://github.com/systemd/systemd/pull/4710 [5] networkd dropping After=dbus.service ordering https://github.com/systemd/systemd/issues/4504

ubuntu-server-builder commented 1 year ago

Launchpad user Brett Holman(holmanb) wrote on 2023-04-13T15:13:28.088306+00:00

Mid-term need is to provide an environmental artifact or mechanism at systemd-generator timeframe to allow cloud-init.service to order After=NetworkManager.service and drop Before=sysinit.target for that use-case.

How broad are we considering this use-case? Any image that uses NetworkManager? Only some specialized NoCloud images? Something else?

This change would cause cloud-init to no longer be blocking "as much of the remaining boot as possible"[1].

Dropping Before=sysinit.target from cloud-init.service could cause other services later in boot that are expecting cloud-init.service to be done by sysinit.target to fail. We could easily test base images, however I don't think this would be sufficient, since any package could provide a service that is ordered After=sysinit.target. Any service that currently orders after sysinit.target and expects cloud-init mounts/disk setup to be complete, for example, could be broken by the proposed change.

[1] https://cloudinit.readthedocs.io/en/latest/explanation/boot.html#network