canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.85k stars 855 forks source link

package_update fails on IPv6 only host: Could not resolve host: mirrors.rockylinux.org #4601

Open minfrin opened 10 months ago

minfrin commented 10 months ago

Bug report

When an IPv6 only host is started up with cloud-init under libvirt, the package_update state fails as per the logs below.

After the machine finishes booting, and you log in, "dnf update" works without issue.

It looks like the package_update is being run before the network is ready.

Steps to reproduce the problem

Pass a cloud-config file containing to the following to a libvirt deployed VM on an IPv6 only network:

package_update: true
package_upgrade: true

Environment details

cloud-init logs

Rocky Linux 9 - BaseOS                          0.0  B/s |   0  B     00:00    
Errors during downloading metadata for repository 'baseos':
  - Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [Could not resolve host: mirrors.rockylinux.org]
Error: Failed to download metadata for repo 'baseos': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [Could not resolve host: mirrors.rockylinux.org]
2023-11-09 09:53:42,717 - util.py[WARNING]: Package update failed
Rocky Linux 9 - BaseOS                          0.0  B/s |   0  B     00:00    
Errors during downloading metadata for repository 'baseos':
  - Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9&countme=1 [Could not resolve host: mirrors.rockylinux.org]
  - Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [Could not resolve host: mirrors.rockylinux.org]
Error: Failed to download metadata for repo 'baseos': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [Could not resolve host: mirrors.rockylinux.org]
2023-11-09 09:53:43,229 - util.py[WARNING]: Package upgrade failed
Rocky Linux 9 - BaseOS                          0.0  B/s |   0  B     00:00    
Errors during downloading metadata for repository 'baseos':
  - Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [Could not resolve host: mirrors.rockylinux.org]
Error: Failed to download metadata for repo 'baseos': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for https://mirrors.rockylinux.org/mirrorlist?arch=x86_64&repo=BaseOS-9 [Could not resolve host: mirrors.rockylinux.org]
2023-11-09 09:53:43,581 - util.py[WARNING]: Failed to install packages: ['tcpdump', 'nmap']
2023-11-09 09:53:43,581 - cc_package_update_upgrade_install.py[WARNING]: 3 failed with exceptions, re-raising the last one
2023-11-09 09:53:43,582 - util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python3.9/site-packages/cloudinit/config/cc_package_update_upgrade_install.py'>) failed
minfrin commented 10 months ago

Looks like cloud-init, or libvirt that is running cloudinit, has decided on an ipv4 only networking config:

2023-11-09 09:52:39,596 - stages.py[DEBUG]: applying net config names for {'ethernets': {'eth0': {'dhcp4': True, 'set-name': 'eth0', 'match': {'macaddress': '52:54:00:c0:b3:1c'}}}, 'version': 2}

Later on, we fail here:

2023-11-09 09:53:42,211 - helpers.py[DEBUG]: Running update-sources using lock (<FileLock using file '/var/lib/cloud/instances/nocloud/sem/update_sources'>)
2023-11-09 09:53:42,211 - rhel.py[DEBUG]: Using DNF for package management
2023-11-09 09:53:42,211 - subp.py[DEBUG]: Running command ['dnf', '-y', 'makecache'] with allowed return codes [0] (shell=False, capture=False)
2023-11-09 09:53:42,717 - util.py[WARNING]: Package update failed
2023-11-09 09:53:42,718 - util.py[DEBUG]: Package update failed
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_package_update_upgrade_install.py", line 87, in handle
    cloud.distro.update_package_sources()
  File "/usr/lib/python3.9/site-packages/cloudinit/distros/rhel.py", line 196, in update_package_sources
    self._runner.run(
  File "/usr/lib/python3.9/site-packages/cloudinit/helpers.py", line 185, in run
    results = functor(*args)
  File "/usr/lib/python3.9/site-packages/cloudinit/distros/rhel.py", line 193, in package_command
    subp.subp(cmd, capture=False)
  File "/usr/lib/python3.9/site-packages/cloudinit/subp.py", line 332, in subp
    raise ProcessExecutionError(
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['dnf', '-y', 'makecache']
Exit code: 1
Reason: -
Stdout: -
Stderr: -

It looks like cloud-init hasn't correctly detected if the network is ready, and has run ahead anyway.

This step then fails, and the whole box deploy fails as a result.

dermotbradley commented 10 months ago

The network configuration looks like "fallback" network configuration.

Which DataSource are you using? (I assume NoCloud). Is valid DataSource-specific network configuration being passed to cloud-init? (i.e. for NoCloud a "cidata" ISO/FAT containing network-config enabling IPv6)

minfrin commented 10 months ago

I've asked the libvirt people to clarify here:

https://gitlab.com/libvirt/libvirt/-/issues/562#note_1647922180

TheRealFalcon commented 9 months ago

cloud-init's fallback config (the config used when no network config is provided) is currently dhcp4 only. We would like to add dhcp6, but that currently introduces some breaking behavior.

ani-sinha commented 9 months ago

cloud-init's fallback config (the config used when no network config is provided) is currently dhcp4 only. We would like to add dhcp6, but that currently introduces some breaking behavior.

Are you sure? :⁠-⁠)

holmanb commented 9 months ago

Looks like cloud-init, or libvirt that is running cloudinit, has decided on an ipv4 only networking config:

This has changed in the default configuration - as of 0264e9691, dhcp with ipv6 should now be supported by default.

This change is expected to be released in upstream 24.1. If you would like to test this release you may build a RPM to test from the tip of our main branch:

git clone https://github.com/canonical/cloud-init.git
cd cloud-init
./packages/brpm

Please let us know whether this resolves your issue.