Open ubuntu-server-builder opened 1 year ago
Launchpad user Chris Patterson(cjp256) wrote on 2022-01-18T17:19:31.963574+00:00
Launchpad attachments: Ubuntu 20.04 nic swap logs
Launchpad user Chris Patterson(cjp256) wrote on 2022-01-18T17:19:53.411383+00:00
Launchpad attachments: Ubuntu 18.04 nic swap logs
Launchpad user James Falcon(falcojr) wrote on 2022-01-19T22:52:55.104750+00:00
Thanks for the thorough bug report. I have confirmed the 20.04 behavior and the root cause.
Launchpad user James Falcon(falcojr) wrote on 2022-01-21T17:22:08.679657+00:00
Adding netplan here as cloud-init is generating the netplan config correctly before network comes up.
@TheRealFalcon feel free to assign this to me
@cjp256 any ideas on how to resolve this one? At first blush, this looks like a netplan issue. I'm not sure how we would solve this in cloud-init that wouldn't just be a workaround.
@holmanb The only option I think will work is dropping set-name
usage for Azure datasource in the netplan config. They should be ordered fine during system enumeration (i.e. eth0 is primary until we force it to swap due to config).
My concern is potential for side effects where set-name may be important. I don't know of a situation, but hard to say there isn't one. Maybe we can add an option to the Azure datasource to toggle the behavior and change the default only for new distro versions to minimize risk?
This bug was originally filed in Launchpad as LP: #1958280
Launchpad details
Launchpad user Chris Patterson(cjp256) wrote on 2022-01-18T17:19:31.963574+00:00
We can reliably reproduce a case where network configuration changes for an Ubuntu 20.04 VM results in a networkd hanging on "pending" interfaces. The interfaces are pending because of conflicts in naming from the current boot and that found in /etc/netplan/50-cloud-init.yaml from previous boot
Specifically, the netplan generator applies the previous configuration's names prior to running cloud-init local. We'll see something like
systemd-udevd[228]: eth0: Failed to process device, ignoring: File exists
.In one scenario, the data source is able to fetch updated network configuration, and cloud-init updates the config & udev rules just fine. However, networking stays offline ("pending") indefinitely. It can be forced to resolve by executing
sudo udevadm trigger --attr-match=subsystem=net
.Example: Create a VM on Azure with two NICs, re-order them, then restart.
az vm create --name test-x1 --image Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest --nics test-nic-01 test-nic-02 az vm deallocate --name test-x1 az vm nics set --vm-name test-x1 --nics test-nic-02 test-nic-01 az vm start --name test-x1
Upon doing that I am unable to login via serial console for 20 minutes until cloud init times out. In this case, Azure is trying to report ready but cannot because system networking never came up. We can remove /lib/systemd/system/cloud-init-local.service.d/50-azure-clear-persistent-obj-pkl.conf, cloud-init doesn't hang the boot, but networking still fails to initialize for the guest.
The behavior for 18.04 is a bit different. On 18.04, the renaming of the interfaces succeeds at early boot, which instead results in the Azure data source failing the local phase because the fallback_interface is no longer the primary NIC (eth1 secondary was renamed to eth0 to match previous boot's config).