canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.9k stars 865 forks source link

networking comes up before hostname is set #3088

Open ubuntu-server-builder opened 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1739516

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2017-12-21T02:05:24.786441+00:00
date_fix_committed = None
date_fix_released = None
id = 1739516
importance = medium
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1739516
milestone = None
owner = mwhudson
owner_name = Michael Hudson-Doyle
private = False
status = confirmed
submitter = mwhudson
submitter_name = Michael Hudson-Doyle
tags = []
duplicates = []

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2017-12-21T02:05:24.786441+00:00

When boot with libvirt a disk image that has been installed with subiquity which has the workaround for bug 1737630 applied, i.e. networkd starts automatically, I cannot ping the VM by hostname from the host.

I think this is because the networking has come up before the hostname is set, so the hostname is not sent along with the DHCP request to libvirt's dnsmasq and so that dnsmasq cannot answer lookups for the hostname. If I run "netplan apply" on the vm, enough things are apparently restarted that DHCP happens again and I can ping the vm by hostname from the host.

I'm not completely sure I have diagnosed this correctly and certainly don't know how to fix it.

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2017-12-21T18:12:27.601017+00:00

This is true that hostname is not set before networking comes up. I would like to fix this, but there are a couple things to consider a.) network datasources currently there are some datasources that run only after networking comes up. As it is right now it is "too late"to read the hostname from the network metadata service and then update the system hostname before dhcp would run.

b.) systemd-networkd's dhcp client seems to actually be listening for hostname getting set and updating its lease information on that event. we saw this in azure when we were removing the old 'bounce the network' code that served the purpose of publishing the

c.) relying on the guest to populate dns information via dhcp is kind of garbage anyway. as a "cloud" solution anyway.

d.) cloud-init allows setting hostname in user-data (in addition to meta-data). the user-data provided by the user could be in a '#include' url, which might not be available until all networking is up. Thus, even if we moved network datasources to pull their information 'pre-network' (the way that the digital ocean md service does) we can't consume all the user-data at that point.

'd' might be a reasonable limitation. the other things are acheivable.

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2017-12-21T22:18:28.617626+00:00

For a and d, sure if finding out what the hostname needs to be involves having the network up, there's nothing that can be done to avoid this.

For c, yes, this is kind of garbage. Utah depends on this though :/ Maybe I can get it to edit the libvirt network config to map the MAC address to a particular IP address instead, that would definitely be less fragile...

And finally for b, it would make sense that a hostname change triggers a refresh of the DHCP lease but I see nothing in the code to do this and my experiments don't seem to indicate it happening either.

ubuntu-server-builder commented 1 year ago

Launchpad user Birger Schmidt(bs-ubo) wrote on 2018-06-19T17:52:24.008293+00:00

I just stumbled over this bug as well.

Reading all the cases (a,b,c...) I do not see the downside in just setting the hostname in the init-local stage as well.

This can be done as an additional step only if the info is already there (i.e. mounted via iso). To check that would not take long and neither would setting the hostname take long.

Please consider adding this functionality and in case you decide against it please tell us what you think the downside of this would be.

As a side note: A similar request can be solved at the same time. See here https://bugs.launchpad.net/cloud-init/+bug/1643688.

ubuntu-server-builder commented 1 year ago

Launchpad user Jesse R(scronkfinkle) wrote on 2022-07-14T14:32:30.747415+00:00

I am also running into this issue. We run DNSMasq and build out our cloud-init images with terraform. We're getting some pretty nasty networking issues because when we roll out any new batches of machines, they all request an IP with the hostname "Ubuntu", and then set their hostname afterwards.

Noticing the age of this ticket, has a better workaround for this kind of behavior been implemented that I missed? It's a pretty big blocker for us, and it seems reasonable to just be able to set the hostname in the local stage

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-07-18T16:05:29.369848+00:00

@Jesse thanks for the bump and notes on this bug, since the origin of this bug we had added a related feature which allows init-lovel based datasources to set the hostname before network is brought online[1]. From my recollection of the feature, it requires that the datasource meta-data.local-hostname[2] (not user-data.hostname) to provide "local-hostname" config.

If you get a chance would you be able to:

  1. provide the steps used in terraform to reproduce this issue
  2. attach the tar.gz from cloud-init sudo collect-logs -u. Note that this collect-logs will include user-data, so please double check to make sure you don't have sensitive information (passwords/credentials) provided from the user-data/meta-data provided during launch.

Thank you, the attached logs will help confirm suspicions on why this feature isn't quite enough for terraform type deployments.

References:

[1] https://github.com/canonical/cloud-init/commit/133ad2cb327ad17b7b81319fac8f9f14577c04df [2] https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L754

ubuntu-server-builder commented 1 year ago

Launchpad user Jesse R(scronkfinkle) wrote on 2022-07-19T15:23:43.099916+00:00

@Chad thanks for writing back! Attached is the collect-logs output.

For terraform, we're using a provider to hook into our proxmox infrastructure. Under the hood, proxmox is calling QEMU to manage the virtual machines. I installed qemu-guest-agent to the cloud-init image using virt-customize from the libguestfs-tools package.

On first boot, the hostname is successfully set, but it doesn't appear to be fast enough before networking is brought up.

To build an identical image to the one i'm using: Download the cloudinit image

wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img

use virt-customize to install qemu-guest-agent

sudo virt-customize -a https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img --install qemu-guest-agent

From there, we upload it to proxmox and have it clone the image for VM's. I would imagine if one used regular Qemu or another provider with terraform the behavior would be the same.

Here's the output of terraform apply

module.greeks["attis"].proxmox_vm_qemu.basic_admin: Refreshing state... [id=aramis5/qemu/111]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.greeks["attis"].proxmox_vm_qemu.basic_admin will be created
  + resource "proxmox_vm_qemu" "basic_admin" {
      + additional_wait           = 0
      + agent                     = 1
      + automatic_reboot          = true
      + balloon                   = 0
      + bios                      = "seabios"
      + boot                      = "c"
      + bootdisk                  = "scsi0"
      + ciuser                    = "awx"
      + clone                     = "ubuntu-2004-cloudinit-terraform"
      + clone_wait                = 0
      + cores                     = 4
      + cpu                       = "host"
      + default_ipv4_address      = (known after apply)
      + define_connection_info    = true
      + force_create              = false
      + full_clone                = true
      + guest_agent_ready_timeout = 100
      + hotplug                   = "network,disk,usb"
      + id                        = (known after apply)
      + ipconfig0                 = "ip=dhcp"
      + kvm                       = true
      + memory                    = 8192
      + name                      = "attis"
      + nameserver                = (known after apply)
      + numa                      = false
      + onboot                    = false
      + oncreate                  = true
      + os_type                   = "cloud-init"
      + preprovision              = true
      + reboot_required           = (known after apply)
      + scsihw                    = "virtio-scsi-pci"
      + searchdomain              = (known after apply)
      + sockets                   = 1
      + ssh_host                  = (known after apply)
      + ssh_port                  = (known after apply)
      + sshkeys                   = "<trimmed>"
      + tablet                    = true
      + target_node               = "aramis5"
      + unused_disk               = (known after apply)
      + vcpus                     = 0
      + vlan                      = -1
      + vmid                      = (known after apply)

      + disk {
          + backup       = 0
          + cache        = "none"
          + discard      = "on"
          + file         = (known after apply)
          + format       = (known after apply)
          + iothread     = 0
          + mbps         = 0
          + mbps_rd      = 0
          + mbps_rd_max  = 0
          + mbps_wr      = 0
          + mbps_wr_max  = 0
          + media        = (known after apply)
          + replicate    = 0
          + size         = "32G"
          + slot         = 0
          + ssd          = 0
          + storage      = "ceph-external"
          + storage_type = (known after apply)
          + type         = "scsi"
          + volume       = (known after apply)
        }

      + network {
          + bridge    = "vmbr0"
          + firewall  = false
          + link_down = false
          + macaddr   = (known after apply)
          + model     = "virtio"
          + queues    = (known after apply)
          + rate      = (known after apply)
          + tag       = -1
        }
    }

attis is the desired hostname that we want for this particular machine Launchpad attachments: cloud-init-sanitized.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Jesse R(scronkfinkle) wrote on 2022-08-23T16:09:15.137518+00:00

I wanted to give an update to this with a fix for anyone else that runs into my particular issue. The first was that using virt-customize install qemu-guest-agent was setting /etc/machine-id. This caused dnsmasq to assign the same CLID to each VM. I assume that means it thought all the VM's were the same machine, requesting an IP on different interfaces. The way to fix that was to truncate the file after installation with

sudo virt-customize -a https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img --truncate /etc/machine-id

With that sorted out, I was also able to use nocloud to set the hostname properly on boot. I used the method of setting the SMBIOS serial. In terraform I was able to specify this as QEMU args like so

args = "-smbios type=1,serial=ds=nocloud-net;h=${var.name}"

where var.name was the hostname.