canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.85k stars 854 forks source link

No longer ignores unmatched ethernet devices #5245

Open plattrap opened 4 months ago

plattrap commented 4 months ago

Bug report

Using the same cloud init configuration on four different servers, I have match records for all their nework cards in a single autoinstall|network block. With version 23.3.3-0ubuntu1\~22.04.1 this worked as expected with no errors and network adapters correctly configured. With version 24.1.3-0ubuntu0\~22.04.1 installation raises an error with an "Not all expected physical devices present" RuntimeError.

Also boot process continues and does not halt at this error, leaving the system in an intermediate state.

Is there way to optionally disable this new check? all/any/disable Or do I need to update the config file during the early-commands portion of autoinstall?

Steps to reproduce the problem

Create a network block with an Ethernet device that is present, and one that is not present.

Environment details

cloud-init logs

Could not access machine because network setup was incomplete.

holmanb commented 4 months ago

Thanks for filing this issue @plattrap!

single autoinstall|network block.

FYI, you're actually using subiquity but you're reporting a bug against cloud-init. The error does originate cloud-init, so this might actually be caused by a change in cloud-init, but it's hard to tell without more information. Please provide the logs from after a failed install so that we can help debug it.

Create a network block with an Ethernet device that is present, and one that is not present.

This check existed in 23.3.3 as well, so while this might produce the same error message, I'm not convinced that this is a reproducer. Are you sure that this wouldn't trigger the same warning on 23.3.3?

Or do I need to update the config file during the early-commands portion of autoinstall?

I'm not sure about that one - early-commands is subiquity concept, not a cloud-init one.

plattrap commented 4 months ago

Logs from a successful install using the Ubuntu Server ISO from today. Only change is downgrading the cloud-init version. cloud-init-23.1.2-0ubuntu0~22.04.1.tar.gz.zip

Same iso, but no change to cloud-init, failed to configure network with the above error. cloud-init-24.1.3-0ubuntu1~22.04.1.tar.gz.zip

Sorry, could not find a more recent version in the Ubuntu package repository. Also github would not upload tar.gz, so wrapped in a zip.

May 02 11:31:03.873123 localhost.localdomain cloud-init[808]: Cloud-init v. 24.1.3-0ubuntu1~22.04.1 running 'init-local' at Thu, 02 May 2024 11:31:03 +0000. Up 3.86 seconds.
May 02 11:31:04.394243 localhost.localdomain cloud-init[808]: 2024-05-02 11:31:04,394 - networking.py[WARNING]: Not all expected physical devices present: {'3c:ec:ef:d0:11:1a', '3c:ec:ef:d0:12:20', '3c:ec:ef:d0:12:21', '3c:ec:ef:d0:49:55', '3c:ec:ef:d0:11:1b', '3c:ec:ef:d0:49:54'}
May 02 11:31:04.394243 localhost.localdomain cloud-init[808]: 2024-05-02 11:31:04,394 - util.py[WARNING]: failed stage init-local
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: failed run of stage init-local
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: ------------------------------------------------------------
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: Traceback (most recent call last):
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 394, in main_init
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     init.fetch(existing=existing)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 501, in fetch
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     return self._get_data_source(existing=existing)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 366, in _get_data_source
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     (ds, dsname) = sources.find_source(
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 1039, in find_source
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     raise DataSourceNotFoundException(msg)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: ()
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: During handling of the above exception, another exception occurred:
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: Traceback (most recent call last):
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 781, in status_wrapper
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     ret = functor(name, args)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 415, in main_init
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     init.apply_network_config(bring_up=bring_up_interfaces)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 1044, in apply_network_config
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     self.distro.networking.wait_for_physdevs(netcfg)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:   File "/usr/lib/python3/dist-packages/cloudinit/distros/networking.py", line 169, in wait_for_physdevs
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]:     raise RuntimeError(msg)
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: RuntimeError: Not all expected physical devices present: {'3c:ec:ef:d0:11:1a', '3c:ec:ef:d0:12:20', '3c:ec:ef:d0:12:21', '3c:ec:ef:d0:49:55', '3c:ec:ef:d0:11:1b', '3c:ec:ef:d0:49:54'}
May 02 11:31:04.402345 localhost.localdomain cloud-init[808]: ------------------------------------------------------------
blackboxsw commented 3 months ago

Thank you for filing this bug and attaching logs for both 23.1 and 24.1 to compare against. When looking through those logs I see one stark difference in the environment logged related to 23.1 vs 24.1. In the 23.1 set of logs, the environment has a config file which actually disables networking for cloud-init: 2024-05-02 10:54:58,516 - util.py[DEBUG]: Read 28 bytes from /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg Which results in cloud-init not trying to setup and block on network configuration: 2024-05-02 10:54:58,488 - stages.py[DEBUG]: network config disabled by system_cfg

But that file (/etc/cloud/cloud.cfg.d/subiquity-disabled-cloudinit-networking.cfg) doesn't appear to exist in the 24.1 boot, which appears to directly related to the network config hangs you are seeing in cloud-init logs on 24.1. Are there any other conditions, such as different versions of subiquity being used in these two environments that results in a slightly different network config being provided to cloud-init in the target system?

sudo grep 'Subiquity server revision' /var/log/installer/subiquity-server-info.log will tell us if there is a difference there as well -- Looks like subiquity snap version 5741 is in the working and broken logs.

As far as I can tell, subiquity determines whether or not to disable networking with the /etc/cloud/cloud.cfg.d/subiquity-disabled-cloudinit-networking.cfg file based on cloud-init's feature flag NETPLAN_CONFIG_ROOT_READONLY = True given that these features in /usr/lib/python3/dist-packages/cloudinit/features.py should both be set to True in both 23.4 and 24.1 I don't understand why these separate install attempts are behaving differently with regard to telling cloud-init to setup networking vs disabling networking functionality.

holmanb commented 3 months ago

@blackboxsw I see you have a linked subiquity commit. I'm guessing that this is not a cloud-init bug based on that change, is that correct? If so, please set the appropriate tag and close this issue.

holmanb commented 3 months ago

@blackboxsw I see you have a linked subiquity commit. I'm guessing that this is not a cloud-init bug based on that change, is that correct? If so, please set the appropriate tag and close this issue.