Open ubuntu-server-builder opened 1 year ago
Launchpad user Chad Smith(chad.smith) wrote on 2023-02-10T05:40:34.121432+00:00
Thanks for filing this bug and helping make cloud-init better. Let's see if we can get to the root of the problem.
This may involve us requesting your attached logs from running cloud-init collect-logs
and attaching the corresponding tar file.
Please check that tarfile instance-data-sensitive.json before attaching because it could contain sensitive information if you provided passwords or user-credentials in user-data on the affected VM.
Minimally I think we need to see the output of journalctl -b 0 -o short-precise and the full cloud-init.log. (which are both grabbed by cloud-init collect-logs anyway).
Generally, I don't think the OpenStack datasource default behavior should be for cloud-init to be actively rewriting or re-applying network config across reboot. It generally should be inert unless the datasource IMDS (instance metadata) either changes the instance-id in meta-data to a new UUID (telling cloud-init it needs to reconfigure the world) or if OpenStack was configured to re-render network per-boot.
So, we might have a bug that LinuxNetworking.apply_network_config_names is running more often than it should across normal system reboots even when the DataSourceOpenStack hasn't told cloud-init to re-render and re-apply new networking config due to BOOT_NEW_INSTANCE event.
I would have expected cloud-init to exit and do nothing with network renames across normal reboots due to these checks https://github.com/canonical/cloud-init/blob/main/cloudinit/stages.py#L905-L916
I think it will help to see full cloud-init.log here to surmise what really has happened with all the PER_BOOT, PER_INSTANCE_REBOOT, datasource cache validation, instance-id and event checks. So we can better determine why cloud-init thinks it should be touching anything w/ network renames across subsequent boots.
I'll set this to 'incomplete' status above, but please set it back to 'new' status when you get a chance to attach logs.
Launchpad user Patrik Lundin(eest) wrote on 2023-02-20T16:56:35.981152+00:00
Attached is the result of running "cloud-init collect-logs" at the point where the netplan file has been modified to state "eth0", "netplan apply" has been run (replacing "ens3" with "eth0" at runtime), the machine has been rebooted and then ends up with an unconfigured "ens3" interface instead of the expected "eth0". Launchpad attachments: cloud-init.tar.gz
Launchpad user Chad Smith(chad.smith) wrote on 2023-03-01T19:47:00.311114+00:00
Thank you much for the logs Patrik.
I can see logs indicating what you suggested, renames applying every reboot regardless of whether the datasource and network has been actively detected and applied: We shouldn't see the "applying net names" logs when "No network config applied. Neither a new instance nor datasource network update allowed". I agree that this bug is undesireable behavior, cloud-init should remain inert on renames because sysadmins could have gone in and changed the static /etc/netplan/*yaml to represent something other than cloud-init's original config.
That said, editing /etc/netplan/50-cloud-init.yaml is also a recipe for problems in the future if the instance-id presented by OpenStack to this node changes the product_uuid for this vm via /sys/class/dmi/id/product_uuid. When that happens, cloud-init will recrawl the OpenStack IMDS endpoints @ 169.254.169.254 and rewrite all network and system configuration, blowing away changes to the 50-cloud-init.yaml file.
$ egrep -i 'applying net|Cloud-init v.|netplan' YOUR_LOGS 2023-02-20 16:19:42,270 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'init-local' at Mon, 20 Feb 2023 16:19:42 +0000. Up 6.51 seconds. 2023-02-20 16:19:43,780 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'mtu': 1500, 'type': 'physical', 'accept-ra': True, 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}], 'mac_address': 'fa:16:3e:19:ed:75', 'name': 'ens3'}, {'type': 'nameserver', 'address': '89.32.32.32'}, {'type': 'nameserver', 'address': '2001:6b0:89::32:32:32'}]} 2023-02-20 16:19:43,786 - stages.py[INFO]: Applying network configuration from ds bringup=False: {'version': 1, 'config': [{'mtu': 1500, 'type': 'physical', 'accept-ra': True, 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}], 'mac_address': 'fa:16:3e:19:ed:75', 'name': 'ens3'}, {'type': 'nameserver', 'address': '89.32.32.32'}, {'type': 'nameserver', 'address': '2001:6b0:89::32:32:32'}]} 2023-02-20 16:19:43,788 - init.py[DEBUG]: Selected renderer 'netplan' from priority list: ['netplan', 'eni', 'sysconfig'] 2023-02-20 16:19:43,791 - subp.py[DEBUG]: Running command ['netplan', 'info'] with allowed return codes [0] (shell=False, capture=True) 2023-02-20 16:19:43,974 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 555 bytes 2023-02-20 16:19:43,975 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True) 2023-02-20 16:19:46,359 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'init' at Mon, 20 Feb 2023 16:19:46 +0000. Up 10.60 seconds. 2023-02-20 16:19:46,540 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'mtu': 1500, 'type': 'physical', 'accept-ra': True, 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}], 'mac_address': 'fa:16:3e:19:ed:75', 'name': 'ens3'}, {'type': 'nameserver', 'address': '89.32.32.32'}, {'type': 'nameserver', 'address': '2001:6b0:89::32:32:32'}]} 2023-02-20 16:19:51,905 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'modules:config' at Mon, 20 Feb 2023 16:19:51 +0000. Up 16.09 seconds. 2023-02-20 16:19:53,116 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'modules:final' at Mon, 20 Feb 2023 16:19:53 +0000. Up 17.31 seconds. 2023-02-20 16:19:53,331 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 finished at Mon, 20 Feb 2023 16:19:53 +0000. Datasource DataSourceOpenStackLocal [net,ver=2]. Up 17.58 seconds 2023-02-20 16:23:51,279 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'init-local' at Mon, 20 Feb 2023 16:23:51 +0000. Up 6.86 seconds. 2023-02-20 16:23:51,357 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'mtu': 1500, 'type': 'physical', 'accept-ra': True, 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}], 'mac_address': 'fa:16:3e:19:ed:75', 'name': 'ens3'}, {'type': 'nameserver', 'address': '89.32.32.32'}, {'type': 'nameserver', 'address': '2001:6b0:89::32:32:32'}]} 2023-02-20 16:23:51,937 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'init' at Mon, 20 Feb 2023 16:23:51 +0000. Up 7.52 seconds. 2023-02-20 16:23:51,993 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'mtu': 1500, 'type': 'physical', 'accept-ra': True, 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}], 'mac_address': 'fa:16:3e:19:ed:75', 'name': 'ens3'}, {'type': 'nameserver', 'address': '89.32.32.32'}, {'type': 'nameserver', 'address': '2001:6b0:89::32:32:32'}]} 2023-02-20 16:23:53,393 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'modules:config' at Mon, 20 Feb 2023 16:23:53 +0000. Up 8.95 seconds. 2023-02-20 16:23:53,814 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 running 'modules:final' at Mon, 20 Feb 2023 16:23:53 +0000. Up 9.38 seconds. 2023-02-20 16:23:53,877 - util.py[DEBUG]: Cloud-init v. 22.3.4-0ubuntu1~22.04.1 finished at Mon, 20 Feb 2023 16:23:53 +0000. Datasource DataSourceOpenStackLocal [net,ver=2]. Up 9.47 seconds
Launchpad user Patrik Lundin(eest) wrote on 2023-03-02T09:41:10.931703+00:00
Hello,
Thanks for the follow-up. If editing "50-cloud-init.yaml" is not a good idea, what is the appropriate way to deal with a rename then? Even if we remove "50-cloud-init.yaml" and create another file, if the "instance-id" changes (is this likely?) will this not just result in "50-cloud-init.yaml" being recreated, now leading to having two conflicting netplan files instead?
If the correct thing is "feed cloud-init this information from the start" then I am not sure how to properly do that: as I stated initially it is documented that you are not allowed to configure network stuff via user-data (https://cloudinit.readthedocs.io/en/22.4.2/topics/network-config.html)
This bug was originally filed in Launchpad as LP: #2006106
Launchpad details
Launchpad user Patrik Lundin(eest) wrote on 2023-02-06T08:32:36.504329+00:00
After creating an Ubuntu 22.04 instance in OpenStack the following netplan file is generated:
With the matching links:
I was then trying to rename the interface from "ens3" to "eth0", updating the file like so:
Applying the config works, the interface is renamed without dropping my SSH connection:
So far so good, but now I reboot the machine, and it will not come back online:
Logging in via a locally connected console I can see the following:
So for some reason the interface comes up as "ens3" again, also it has no address configuration assigned which is the reason I can not reach it. If I then run a manual "netplan apply" I can get it online again:
Now logged in over SSH again checking the dmesg log for renames the following can be seen:
So the network name has been flapping back and forth between "ens3" and "eth0".
After digging around I think this is what happens:
Looking at /var/log/cloud-init.log the following message is seen:
I had a hard time understanding how cloud-init knew about the previous "ens3" name initially, but now I think this has been persisted in the obj.pkl at initial install time boot and is now picked up on subsequent boots, from that same log:
Taking a look in the file:
From what I can tell this "name" is picked up in the openstack helper at https://github.com/canonical/cloud-init/blob/483f79cb3b94c8c7d176e748892a040c71132cb3/cloudinit/sources/helpers/openstack.py#L715
So... the question then is, how should this work? Right now it seems cloud-init is helping me with a rename even if I have asked the netplan file to set another name than the machine had at initial install.
One thing that occured to me is that maybe I am expected to feed cloud-init user-data so it can know initially that I want the interface called "eth0", but reading https://cloudinit.readthedocs.io/en/22.4.2/topics/network-config.html it states "User-data cannot change an instance’s network configuration." so it seems this is not expected behaviour.
For now I guess the simplest workaround is to just disable the network management parts as mentioned in the generated netplan file, this works:
Now the machine comes up by itself, and there are less renames happening:
It feels strange to have to disable the network management parts... What would be the correct way to deal with this situation?