canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 883 forks source link

Gentoo no longer applying network config #3999

Open ubuntu-server-builder opened 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1981912

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2022-07-17T02:08:52.262346+00:00
date_fix_committed = None
date_fix_released = None
id = 1981912
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1981912
milestone = None
owner = robtongue
owner_name = Rob Tongue
private = False
status = confirmed
submitter = robtongue
submitter_name = Rob Tongue
tags = []
duplicates = []

Launchpad user Rob Tongue(robtongue) wrote on 2022-07-17T02:08:52.262346+00:00

It seems that cloud-init has evolved past the previous work in getting gentoo functioning on first boot. It is missing the proper renderer to configure the network on the booted machine, in this case would be "openrc".

I am not good with python enough to help, but the original gentoo script in cloudinit/distros/gentoo.py looks like has mostly what is needed to get the network configured, and needs to be adapted to a network renderer.

Putting this bug here to hopefully get some traction on this.

stages.py[ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan', 'network-manager', 'freebsd', 'netbsd', 'openbsd', 'networkd']

ubuntu-server-builder commented 1 year ago

Launchpad user Brett Holman(holmanb) wrote on 2022-07-19T15:26:12.089830+00:00

Thanks for reporting! Unfortunately our Gentoo test coverage is not very comprehensive.

I think this was likely broken in 81299de5fe3b6e491a965a6ebef66c6b8bf2c037.

That commit removed _write_network_config changed the behavior. Previously all distros implemented _write_network_config() which in Gentoo threw a NotImplemented exception prior to that commit (default behavior from the base class). This was then caught in cloudinit/distros/init.py:Distro.apply_network_config(), which caused the following call path:

apply_network_config() -> _apply_network_from_network_config() -> apply_network() -> which would have called Gentoo's networking code.

Currently _write_network_state() calls renderers.select(), which raises RendererNotFoundError and causes the error you mentioned.

Unfortunately this commit can't just be reverted in upstream. I'll share a proposed patch and if someone can try it that would be really helpful.

ubuntu-server-builder commented 1 year ago

Launchpad user Brett Holman(holmanb) wrote on 2022-07-19T15:47:07.478603+00:00

This is ugly, but if someone tests it (which will confirm/deny my analysis above) then we can try to push this fix (or a less ugly equivalent) into upstream.

The thrown NotImplementedError will be caught in apply_network_config() which should put gentoo back on the fallback path it was on before.

ubuntu-server-builder commented 1 year ago

Launchpad user Brett Holman(holmanb) wrote on 2022-07-19T16:02:42.512230+00:00

diff --git a/cloudinit/distros/init.py b/cloudinit/distros/init.py index e27a3f93..1e69709d 100644 --- a/cloudinit/distros/init.py +++ b/cloudinit/distros/init.py @@ -81,6 +81,7 @@ class Distro(persistence.CloudInitPickleMixin, metaclass=abc.ABCMeta): renderer_configs: Mapping[str, Mapping[str, Any]] = {} _preferred_ntp_clients = None networking_cls: Type[Networking] = LinuxNetworking

ubuntu-server-builder commented 1 year ago

Launchpad user Rob Tongue(robtongue) wrote on 2022-07-20T02:25:09.660702+00:00

This got it further. It created the configuration in /etc/conf.d/net.eth0, but it was non-working. It didn't like the mac_eth0="None" that got thrown in there. I do know the configuration that is being fed to cloud-init has the proper mac, so it has to be an error in the code.

NucleaPeon commented 9 months ago

I am confirming this issue.

I am working on a tool to build cloud-init images for Gentoo and I can build both MBR and EFI ones. MBR works perfectly fine, EFI has broken cloud-init networking.

The first error I get is something like no available network renderers found unable to render networking stages.py

Then I get errors like Calling 'None' failed request error HTTTPConnectionPool Max retries exceeded with url /2009-04-04/meta-data/instance-id, multiple entries and it takes a while before reaching the login prompt.

Please let me know if there's any information I can provide for you. If you want to reproduce my methodology, check out my repo https://github.com/NucleaPeon/gentooimgr and follow the EFI portions of the readme.

I will try applying your patch and report back.

NucleaPeon commented 9 months ago

I applied the patch but as expected, it still encounters the issues I mentioned above.

I did notice that it didn't detect my dhcpcd install when I looked at the logs:

2024-02-02 19:32:16,709 - dhcp.py[WARNING]: DHCP client not found: dhcpcd
localhost ~ # eix dhcpcd
* acct-group/dhcpcd
     Available versions:  0-r2
     Description:         System group: dhcpcd

* acct-user/dhcpcd
     Available versions:  0-r2
     Description:         user for dhcpcd client

[I] net-misc/dhcpcd
     Available versions:  9.5.1 10.0.3 10.0.5-r1 ~10.0.6 ~10.0.6-r1 **9999*l {debug +embedded ipv6 privsep +udev}
     Installed versions:  10.0.5-r1(07:41:04 AM 02/02/2024)(embedded ipv6 udev -debug -privsep)
     Homepage:            https://github.com/NetworkConfiguration/dhcpcd/ https://roy.marples.name/projects/dhcpcd/
     Description:         A fully featured, yet light weight RFC2131 compliant DHCP client

cloud-init-output.log cloud-init.log

I installed dhclient and removed dhcpcd, but it still fails to make a connection or recognize it's installed, so I switched back to dhcpcd. If I add dhcpcd to the boot runlevel, it puts more timeout messages into the logs and takes longer to reach login than when it's at the default runlevel.

NucleaPeon commented 9 months ago

I installed networkmanager (-modemmanager -wext -wifi -bluetooth) and added it to runlevel default. I also removed dhcpcd and dhcpd services. It gives me an exception in the log file:

2024-02-02 22:31:19,714 - log.py[DEPRECATED]: DataSourceDigitalOcean is deprecated in 23.2 and scheduled to be removed in 28.2. Deprecated in favour of DataSourceConfigDrive.
2024-02-02 22:31:21,051 - DataSourceGCE.py[WARNING]: Did not find a fallback interface on gce.
2024-02-02 22:31:21,102 - DataSourceVMware.py[ERROR]: failed to find a valid data access method
2024-02-02 22:31:21,167 - networking.py[WARNING]: Not all expected physical devices present: {'00:00:00:00'}
2024-02-02 22:31:21,168 - util.py[WARNING]: failed stage init-local
failed run of stage init-local
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/cloudinit/cmd/main.py", line 394, in main_init
    init.fetch(existing=existing)
  File "/usr/lib/python3.11/site-packages/cloudinit/stages.py", line 493, in fetch
    return self._get_data_source(existing=existing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/cloudinit/stages.py", line 360, in _get_data_source
    (ds, dsname) = sources.find_source(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/cloudinit/sources/__init__.py", line 1028, in find_source
    raise DataSourceNotFoundException(msg)
cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: (DataSourceNoCloud, DataSourceConfigDrive, DataSourceLXD, DataSourceOpenNebula, DataSourceDigitalOcean, DataSourceAzure, DataSourceOVF, DataSourceMAAS, DataSourceGCELocal, DataSourceOpenStackLocal, DataSourceAliYunLocal, DataSourceVultr, DataSourceEc2Local, DataSourceCloudSigma, DataSourceSmartOS, DataSourceScaleway, DataSourceHetzner, DataSourceIBMCloud, DataSourceOracle, DataSourceRbxCloud, DataSourceUpCloudLocal, DataSourceVMware, DataSourceNWCS, DataSourceAkamaiLocal)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/cloudinit/cmd/main.py", line 781, in status_wrapper
    ret = functor(name, args)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/cloudinit/cmd/main.py", line 415, in main_init
    init.apply_network_config(bring_up=bring_up_interfaces)
  File "/usr/lib/python3.11/site-packages/cloudinit/stages.py", line 1032, in apply_network_config
    self.distro.networking.wait_for_physdevs(netcfg)
  File "/usr/lib/python3.11/site-packages/cloudinit/distros/networking.py", line 169, in wait_for_physdevs
    raise RuntimeError(msg)
RuntimeError: Not all expected physical devices present: {'00:00:00:00'}
------------------------------------------------------------

cloud-init-output.log

NucleaPeon commented 9 months ago

The stages.py[ERROR]: Unable to render networking. Network config is likely broken: No available network renderers found. Searched through list: ['eni', 'sysconfig', 'netplan', 'network-manager', 'freebsd', 'netbsd', 'openbsd', 'networkd'] may be a red herring, as it occurs in my logs on a working MBR gentoo cloud-init image.

See attached log. cloud-init-gentoo-mbr-output.log

cloud-gentoo-mbr-init.log