canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.98k stars 881 forks source link

sr0 not available at generator timeframe causes cloud-init.target not run #3897

Open ubuntu-server-builder opened 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1940791

Launchpad details
affected_projects = ['cloud-images']
assignee = None
assignee_name = None
date_closed = None
date_created = 2021-08-23T03:02:20.779310+00:00
date_fix_committed = None
date_fix_released = None
id = 1940791
importance = undecided
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1940791
milestone = None
owner = esj
owner_name = Éric St-Jean
private = False
status = triaged
submitter = achasen
submitter_name = Adam Chasen
tags = []
duplicates = [1961832]

Launchpad user Adam Chasen(achasen) wrote on 2021-08-23T03:02:20.779310+00:00

Focal image cloud-init generator reports: 'cloud-init is enabled but no datasource found, disabling'

looks to be related to ds-identify not finding the cdrom drive (and caching it) on first run. Not sure why /dev/sr0 would not be available early enough.

cat /run/cloud-init/ds-identify.log ... ISO9660_DEVS= ... No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1] [up 1.20s] returning 1 root@ubuntu:~# /usr/lib/cloud-init/ds-identify --force [up 200.71s] ds-identify --force ... ISO9660_DEVS=/dev/sr0=cidata ... Found single datasource: NoCloud [up 200.79s] returning 0

Booting https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-disk-kvm.img as of Aug 22, 2021 in KVM (created with virt-install and libvirt) along with cloud-config ISO

$ cat /tmp/cloud

cloud-config

hostname: proxy1 $ cloud-localds /tmp/test.iso /tmp/cloud

cloud-init.target never reached and network doesn't come up (default behavior for cloud-init is eth0 DHCP). If I manually start systemctl start cloud-init.target then I get what I expected, but by then it is "too late" and I also have to kick systemd-networkd.

cloud-init starts up as expected with the same environment when using Bionic (https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img)

The focal image never touches cloud-init.target. Note that there is no reverse dependency in focal.

root@ubuntu:~# systemctl list-dependencies --reverse cloud-init.target cloud-init.target

Both images have default target of "graphical.target"

There is mention of a "generator" and "detection" in the cloud-init docs. https://cloudinit.readthedocs.io/en/latest/topics/boot.html

The generator appears to be what is adding the "wants" of cloud-init.target to multi-user.target from /lib/systemd/system-generators/cloud-init-generator:     local target_name="multi-user.target" gen_d="$early_d"     local link_path="$gen_d/${target_name}.wants/${CLOUD_TARGET_NAME}"

Bionic: root@proxy1:~# systemctl get-default graphical.target root@proxy1:~# UNIT LOAD ACTIVE SUB DESCRIPTION basic.target loaded active active Basic System cloud-config.target loaded active active Cloud-config availability cloud-init.target loaded active active Cloud-init target ... root@proxy1:~# systemctl list-dependencies --reverse cloud-init.target cloud-init.target ● └─multi-user.target ● └─graphical.target root@proxy1:/etc/systemd/system# cat /run/cloud-init/cloud-init-generator.log /lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late kernel command line (/proc/cmdline): BOOT_IMAGE=/boot/vmlinuz-4.15.0-154-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 kernel_cmdline found unset etc_file found unset default found enabled checking for datasource ds-identify rc=0 ds-identify _RET=found enabled via /run/systemd/generator.early/multi-user.target.wants/cloud-init.target -> /lib/systemd/system/cloud-init.target

Focal: root@ubuntu:~# systemctl get-default graphical.target root@ubuntu:~# systemctl list-units --type=target --all   UNIT LOAD ACTIVE SUB >   basic.target loaded active activ>   blockdev@dev-disk-by\x2dlabel-cloudimg\x2drootfs.target loaded inactive dead >   blockdev@dev-disk-by\x2dlabel-UEFI.target loaded inactive dead >   blockdev@dev-loop0.target loaded inactive dead >   blockdev@dev-loop1.target loaded inactive dead >   blockdev@dev-loop2.target loaded inactive dead >   blockdev@dev-vda15.target loaded inactive dead >   cloud-config.target loaded inactive dead >   cloud-init.target loaded inactive dead > root@ubuntu:~# systemctl list-unit-files ... cloud-config.service enabled enabled cloud-final.service enabled enabled cloud-init-local.service enabled enabled cloud-init.service enabled enabled ... root@ubuntu:~# systemctl list-dependencies --reverse cloud-init.target cloud-init.target root@ubuntu:~# systemctl list-dependencies cloud-init.target cloud-init.target ● ├─cloud-config.service ● ├─cloud-final.service ● ├─cloud-init-local.service ● └─cloud-init.service

root@ubuntu:~# cat /run/cloud-init/cloud-init-generator.log /usr/lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late kernel command line (/proc/cmdline): BOOT_IMAGE=/boot/vmlinuz-5.4.0-1045-kvm root=PARTUUID=14530a28-f129-4b51-a64e-c64075fae7c7 ro console=tty1 console=ttyS0 panic=-1 kernel_cmdline found unset etc_file found unset default found enabled checking for datasource ds-identify rc=1 ds-identify _RET=notfound cloud-init is enabled but no datasource found, disabling already disabled: no change needed [no /run/systemd/generator.early/multi-user.target.wants/cloud-init.target]

Additional Resources: Possibly same issue https://bugzilla.redhat.com/show_bug.cgi?id=1820540

ubuntu-server-builder commented 1 year ago

Launchpad user John Chittum(jchittum) wrote on 2021-09-01T16:19:16.414551+00:00

Could you provide exact reproduction steps with virt-install and libvirt. I am attempting to reproduce locally with setups we normally use for testing, and am unable to:

  1. downloaded image from https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-disk-kvm.img
  2. created a simple cloud-init yaml file:

cloud-config

password: chpasswd: { expire: False } ssh_pwauth: True ssh_import_id: jchittum sudo: ALL=(ALL) NOPASSWD:ALL

  1. using cloud-localds from cloud-image-utils, made an ISO of the cloud-config cloud-localds cloud_init_with_pass.iso cloud-init.yaml

  2. used qemu to test the image:

qemu-system-x86_64 \ -cpu host -machine type=q35,accel=kvm -m 2048 \ -nographic \ -snapshot \ -netdev id=net00,type=user,hostfwd=tcp::2222-:22 \ -device virtio-net-pci,netdev=net00 \ -drive if=virtio,format=qcow2,file=focal-server-cloudimg-amd64-disk-kvm.img \ -drive if=virtio,format=raw,file=cloud_init_with_pass.iso

This qemu command sets the accel to kvm, and i had no issues. I'm guessing that the drive setup is very different though.

From my working knowledge of libvirt and cloud-init, you do need to mount the cloud-init image in a specific place, and I don't think there would be an issue, generally, with the kvm image not getting sr0 up fast enough. qemu is mounting to the same place in that command.

Could you provide the libvirt XML definition and exact reproduction steps for us to dig a little deeper?

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2021-09-01T17:51:09.861503+00:00

I also haven't been able to reproduce on focal. It makes me think that there is a potential systemd unit ordering cycle on the image/config that represented this issue?

on focal I see the reverse deps on latest daily images:

root@dev-ff:~# systemctl list-dependencies --reverse cloud-init.target cloud-init.target ● └─multi-user.target ● └─graphical.target root@dev-ff:~# lsb_release -sc focal

A guess in the dark would be to check is journalctl -b 0 and look for "ordering cycle" related messages too.

ubuntu-server-builder commented 1 year ago

Launchpad user Adam Chasen(achasen) wrote on 2021-09-01T20:10:03.834280+00:00

able to reproduce with image created with

virt-install --connect qemu:///session \                                   
--name cloudinit-test \
--memory 2048 \
--disk /home/achasen/tmp/focal.img,device=disk,bus=virtio \
--os-type linux \
--os-variant ubuntu20.04 \
--virt-type kvm \
--graphics none \
--network bridge=virbr0,model=virtio \
--import \
--disk /tmp/test.iso,device=cdrom,bus=sata

/run/cloud-init/cloud-init-generator.log indicated run around 0.69s:

No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 0.69s] returning 1

jornalctl shows things like "Starting Network Service" before sr0 is in the log (which makes me think the sr0 is delayed). I didn't find anything in journalctl output related to the generator.

[    1.857890] ubuntu systemd[1]: Starting Network Service...
...
[    2.364539] ubuntu kernel: ata3: SATA link down (SStatus 0 SControl 300)
[    2.364609] ubuntu kernel: ata5: SATA link down (SStatus 0 SControl 300)
[    2.364642] ubuntu kernel: ata1.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    2.364643] ubuntu kernel: ata1.00: applying bridge limits
[    2.364884] ubuntu kernel: ata1.00: configured for UDMA/100
[    2.364350] ubuntu kernel: ata4: SATA link down (SStatus 0 SControl 300)
[    2.364426] ubuntu kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl>
[    2.364539] ubuntu kernel: ata3: SATA link down (SStatus 0 SControl 300)
[    2.364609] ubuntu kernel: ata5: SATA link down (SStatus 0 SControl 300)
[    2.364642] ubuntu kernel: ata1.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    2.364643] ubuntu kernel: ata1.00: applying bridge limits
[    2.364884] ubuntu kernel: ata1.00: configured for UDMA/100
[    2.365032] ubuntu kernel: scsi 0:0:0:0: CD-ROM            QEMU     QEMU DVD>
[    2.365242] ubuntu kernel: sr 0:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa>
[    2.365250] ubuntu kernel: cdrom: Uniform CD-ROM driver Revision: 3.20
[    2.379293] ubuntu kernel: sr 0:0:0:0: Attached scsi CD-ROM sr0
[    2.416795] ubuntu systemd[1]: Finished udev Wait for Complete Device Initia>
[    2.417385] ubuntu systemd[1]: Starting Device-Mapper Multipath Device Contr>

Launchpad attachments: virsh dumpxml result from virt-install

ubuntu-server-builder commented 1 year ago

Launchpad user Vincent Saelzler(vincentsaelzler) wrote on 2021-09-24T19:33:11.755335+00:00

I have the same issue with the Azure/Hyper-V Image. Running on local Windows desktop, using Hyper-V as the hypervisor.

Steps to reproduce:

  1. Download and extract https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-azure.vhd.zip. Save disk image as 20.04-cloud.vhd.

  2. Create my-seed.iso file almost exactly as described in cloud-init documentation. Only small tweak is saving as ISO instead of IMG. https://cloudinit.readthedocs.io/en/latest/topics/debugging.html

$ cat > user-data <<EOF

cloud-config

password: passw0rd chpasswd: { expire: False } EOF $ cloud-localds my-seed.iso user-data

  1. Create new VM using Hyper-V GUI
    • Virtual Hard Disk Image = 20.04-cloud.vhd
    • Virtual DVD Drive Image = my-seed.iso

=> After starting the VM, I cannot log in.

Possibly helpful note: When using the standard (non-cloud) installer, this file seems to prevent the VM from using an ISO attached to the system: /etc/cloud/cloud.cfg.d/99-installer.cfg

It saves the user details that I manually entered during the install process, and critically, explicitly sets the data source to none.

$ cat /run/cloud-init/ds-identify.log /etc/cloud/cloud.cfg.d/99-installer.cfg set datasource_list: [None]

After deleting the file, the ISO was recognized (and PW of "passw0rd" for ubuntu user worked)

$ cat /run/cloud-init/ds-identify.log /etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, UpCloud, Vultr, None ]

I do not know how to get debug output from the cloud image, because I cannot login as any user! If someone can explain how to do that, I would be happy to provide more output from the cloud image VM.

ubuntu-server-builder commented 1 year ago

Launchpad user Gauthier Jolly(gjolly) wrote on 2021-09-28T07:50:39.603788+00:00

Hi Vincent,

Thank you for your comment. What you are seeing with the Azure cloud-images is not related with the current issue.

Azure VHDs you can find on c-i.u.c are the same images we publish on Azure Cloud. Those are configured with a single Cloud-Init datasource (Azure) to make the image boot faster. While it is possible to boot those images locally on hyper-v, you will end up with a VM that is not fully functional.

If you look carefully at the bug description, you will see that @achasen uses KVM images (not Azure images) that should work out of the box on KVM.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2021-11-28T04:17:19.681502+00:00

[Expired for cloud-init because there has been no activity for 60 days.]

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2021-11-28T04:17:21.014769+00:00

[Expired for cloud-images because there has been no activity for 60 days.]

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-02-25T00:42:41.882156+00:00

I apologize for the expiry on this bug it slipped through the cracks as it was set to incomplete status which eventually expires if not set back to New.

The reason we don't have cloud-init included in your boot target is due to the ds-identify generator not seeing the /dev/sr0 yet with a cidata label due to what appears to be a later module load.

Cloud-init can tell you on focal that it's disabled due to the generator-time failure to find a matching datasource.

root@focal:~# cloud-init status --long status: disabled detail: Cloud-init disabled by cloud-init-generator

I am able to reproduce the original error with the following steps and as Adam suggested: $ sudo virt-install --connect qemu:///session --name cloudinit-test --memory 2048 --disk /home/csmith/src/cloud-init/focal-server-cloudimg-amd64-disk-kvm.img,device=disk,bus=virtio --os-type linux --os-variant ubuntu20.04 --virt-type kvm --graphics none --network bridge=virbr0,model=virtio --import --disk "/tmp/test.iso,device=cdrom,bus=sata"

On Focal, we can see /run/cloud-init/ds-identify.log which is emitted when cloud-init's generator runs beats the journalctl -b 0 timing of when the /dev/sr0 is seen due to later kernel module load.

from journalctl:

Feb 24 21:56:28 ubuntu kernel: sr 0:0:0:0: Attached scsi CD-ROM sr0

root@focal:~# ls -ltr --full-time /dev/disk/by-label/ /run/cloud-init/ds-identify.log

Generator time 21:56:27

-rw-r--r-- 1 root root 1504 2022-02-24 21:56:27.241872017 +0000 /run/cloud-init/ds-identify.log

/dev/sr0 availability no until 1 second later

/dev/disk/by-label/: total 0 lrwxrwxrwx 1 root root 10 2022-02-24 21:56:28.173872017 +0000 cloudimg-rootfs -> ../../vda1 lrwxrwxrwx 1 root root 9 2022-02-24 21:56:28.441872017 +0000 cidata -> ../../sr0 lrwxrwxrwx 1 root root 11 2022-02-24 21:56:28.581872017 +0000 UEFI -> ../../vda15

This needs a bit more investigation and probably can be worked around with add the virt-install argument --sysinfo system.serial='ds=nocloud' which will force ds-identify to detect NoCloud regardless of the presence of /dev/sr0. Since the device will be up before NoCloud.get_data is run, this will avoid the race.

ubuntu-server-builder commented 1 year ago

Launchpad user James Falcon(falcojr) wrote on 2022-02-25T16:31:05.953838+00:00

A duplicate bug, https://bugs.launchpad.net/bugs/1961832 , provides some additional context and consistent reproduction steps.