canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.92k stars 871 forks source link

[Ubuntu 22.04]cloud-init failed to complete after 10 minutes of waiting was shown during Installation via iDRAC Virtual Console #4009

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1986781

Launchpad details
affected_projects = ['subiquity', 'casper (Ubuntu)', 'casper (Ubuntu Focal)', 'casper (Ubuntu Jammy)', 'casper (Ubuntu Kinetic)']
assignee = None
assignee_name = None
date_closed = 2022-09-16T19:09:09.209707+00:00
date_created = 2022-08-17T09:55:27.968812+00:00
date_fix_committed = None
date_fix_released = None
id = 1986781
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1986781
milestone = None
owner = shubhakara
owner_name = Shubhakar Gowda P S
private = False
status = invalid
submitter = shubhakara
submitter_name = Shubhakar Gowda P S
tags = ['se-sponsor-mfo', 'sts', 'verification-done', 'verification-done-jammy']
duplicates = [1969919, 1971181, 2003152]

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-08-17T09:55:27.968812+00:00

[Impact]

 * Users that install Ubuntu Server through slow    media (eg, virtual optical drive over network,    which may be common on enterprise deployments)    might hit the following subiquity startup error:

  'cloud-init failed to complete after 10 minutes of waiting'

 * (That in addition to 10 minutes of waiting themselves.)

 * This happens because casper-md5check.service is    (slowly) verifying the integrity of install media,    which blocks multi-user.target,    which blocks cloud-final.service,    which blocks cloud-init status --wait    which is used in subiquity / waiting on cloud-init).

[Fix]

 * The adopted solution (merged on lunar) is simply    not to block multi-user.target, but rather run    after it.

[Test Steps]

 For a synthetic reproducer of slowness of casper-md5check:

  boot with break=init to break into initramfs-tools    before exec() systemd.   chroot /root /bin/bash   edit /usr/lib/systemd/system/casper-md5check.service   prepend strace --inject read:delay_enter=5s to the    command in ExecStart, to introduce a 5 secs delay    to every read() syscall performed by casper-md5check.  * exit twice (chroot, initramfs shell) to resume boot.

  See comment 37 for examples.

[Regression Potential]

 * There's a cosmetic glitch in the proposed solution:    the systemd line when casper-md5check finishes    shows up on top of subiquity's menu (screenshot):

   "[ OK ] Finished casper-md5check Verify Live ISO checksums."

   Dan Bungert mentioned this is known and should be    addressed in a future change to subiquity, and is    not supposed to block the SRU for Jammy / 22.04.2.

[Original Description]

Description:

On Dell EMC PowerEdge system when Install Ubuntu 22.04 via iDRAC Virtual Console, cloud-init failed to complete after 10 minutes of waiting.

Steps to Reproduce:

  1. Login to iDRAC and Launch Virtual Console.
  2. Connect to Virtual Media and Map ubuntu 22.04 iso file using Map CD/DVD option.
  3. Try Installing Ubuntu server.
  4. "cloud-init" failed to complete after 10 minutes of waiting was shown during Installation.

Expected Results :-

Installation should be successful.

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-08-17T10:13:39.377607+00:00

Launchpad attachments: Attaching cloud-init Installation logs under /var/log

ubuntu-server-builder commented 1 year ago

Launchpad user James Falcon(falcojr) wrote on 2022-08-17T14:50:02.501990+00:00

I took a look at the logs, and this doesn't appear to be a problem with cloud-init specifically. Cloud-init runs multiple times during boot, and each run has its own set of systemd services that it depends on having run first. If we look at cloud-init.log, we can see that there's a time where one cloud-init instance exits, then a greater than 10 minute gap before cloud-init's final service runs:

2022-08-17 06:38:53,429 - util.py[DEBUG]: cloud-init mode 'modules' took 0.139 seconds (0.14) 2022-08-17 06:38:53,429 - handlers.py[DEBUG]: finish: modules-config: SUCCESS: running modules for config 2022-08-17 06:52:29,424 - util.py[DEBUG]: Cloud-init v. 22.1-14-g2e17a0d6-0ubuntu1~22.04.5 running 'modules:final' at Wed, 17 Aug 2022 06:52:29 +0000. Up 999.23 seconds.

This can happen if one of cloud-init's dependent services has stalled or failed. To see if another service has stalled or failed, you could try the following commands: systemd-analyze blame systemctl --failed

Does the entire installation fail, or just cloud-init?

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-08-22T07:21:17.471994+00:00

James Falcon,

Does the entire installation fail, or just cloud-init?

Only cloud-init will fail. Once issue occurs when hit 'cancel' we can proceed with the further Installation successfully.

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-08-24T16:26:46.172306+00:00

Thanks for the info, shubhakara.

To debug further what we need from you on the failed installed system once it boots is the output of

This will help us determine what other systemd service ordering seems to be delaying cloud-init boot stages from running.

Most of he data above gets collected by running cloud-init collect-logs which emits a tar.gz file that can be attached to this bug if you don't want to run the commands individually.

Given that you mentioned clicking "cancel" and there is a /var/log/intaller subdirectory, this looks like you are using the ubuntu server live-install (subiquity) to install this system. I'm adding subiquity project too just to keep devs aware.

Looking through /var/log/installer/subiquity-server-debug*log I can see Starting Subiquity server revision 3359 is blocking on cloud-init status --wait which alludes to the 10 minute wait you referenced in this bug.

timeout logs on cloud-init status --wait

2022-08-17 06:39:32,689 DEBUG subiquitycore.utils:77 arun_command called: ['cloud-init', 'status', '--wait'] 2022-08-17 06:39:32,861 DEBUG curtin.reporting.start.subiquity/Meta/status_GET:45 start: subiquity/Meta/status_GET: 2022-08-17 06:39:32,863 DEBUG curtin.reporting.finish.subiquity/Meta/status_GET:45 finish: subiquity/Meta/status_GET: SUCCESS: 200 {"state": "CLOUD_INIT_WAIT", "confirming_tty": "", "error": null, "cloud_init... 2022-08-17 06:39:32,863 INFO aiohttp.access:233 [17/Aug/2022:06:39:32 +0000] "GET /meta/status?cur=null HTTP/1.1" 200 419 "-" "Python/3.8 aiohttp/3.6.2" 2022-08-17 06:39:33,347 DEBUG curtin.reporting.start.subiquity/Meta/status_GET:45 start: subiquity/Meta/status_GET: 2022-08-17 06:49:32,690 DEBUG subiquity.server.server:519 waited 600.0012300014496s for cloud-init 2022-08-17 06:49:32,691 DEBUG subiquity.server.server:535 cloud-init status: '', assumed disabled

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-08-24T16:28:00.754191+00:00

Please set the bug status back to "New" when further comments have been added to make sure our team gets alerted about the updated comments so we can respond more quickly.

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-08-25T11:49:18.959580+00:00

Chad Smith,

Here is the attachment using 1G end to end connection without VPN. In the old attachment shared previously, virtual-media used over a VPN connection to install the OS.

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-08-25T13:58:26.496810+00:00

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-08-25T14:03:04.191000+00:00

Chad Smith,

File "cloud-init-install-fail.tar" in comment #6 contains the cloud init logs collected under /var/log. File "cloud-init.tar.gz" in comment #7 contains the cloud init logs collected using the command "cloud-init collect-logs".

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-08-26T21:30:59.368303+00:00

Thanks Shubhakar Gowda P S (shubhakara) for getting back to us.

From journal.txt in your attached logs in comment #7 is looks like the 10 minute timeout is not strictly related to cloud-init, but cloud-init waiting on snaps being seeded.

I can see a snapd log in your journalctl -b 0 output that says snapd was timed out after an inordinate amount of time waiting for NTP:

Aug 25 08:55:42.976933 ubuntu-server snapd[1642]: devicemgr.go:1927: no NTP sync after 10m0s,

I also see a whole bunch of ntpd and network.timed

The systemd sevice which performs cloud configuration cloud-config.service has a clause which makes it wait until snapd.seeded.service completes in the booting environment so cloud-init use of any snap tools/services installed:

After=snapd.seeded.service

So this 10 minute wait is due to something in either networking setup in the ephemeral/early install environment not being configured appropriately for snapd to come up. Once snapd unblocks, cloud-init can complete.

You can confirm that the problem is snap seeding on your system with the commands we listed above in comments #2 and #4:

That should point to snap seeding being the problem in this early/ephemeral install environment.

Also in your journalctl -b 0 output (in cloud-init collect-logs /var/log/journal.txt)

A red flag I'm also seeing is repeated logs in your journal that look like thrashing of time-related services: Aug 25 08:52:55.229686 ubuntu-server dbus-daemon[1630]: [system] Successfully activated service 'org.freedesktop.timedate1' Aug 25 08:52:55.229931 ubuntu-server systemd[1]: Started Time & Date Service. Aug 25 08:53:55.291021 ubuntu-server systemd[1]: systemd-timedated.service: Deactivated successfully.

Something is amiss with ntp or network configuration on this installation and it looks like it is causing issues for snapd which cause issue for cloud-init to make progress on the install of this system.

In checking /var/log/cloud-init-output.log I see that the two NICs are listed as link down: ci-info: | eno1 | False | . | . | . | f4:ee:08:19:37:49 | ci-info: | eno2 | False | . | . | . | f4:ee:08:19:37:4a |

But journalctl -b 0 showed them as "UP": Aug 25 08:45:04.600250 ubuntu-server systemd-networkd[1594]: eno2: Link UP Aug 25 08:45:04.635710 ubuntu-server systemd-networkd[1594]: eno1: Link UP

Something seems to be turning off the NICs on this device maybe?

Can you please provide:

  1. Was the a customized install Ubuntu 22.04 installer ISO?
  2. if public ISO can you provide the URL to the ISO used for install?
  3. Run the separate commands and attach that output
    • systemd-analyze blame
    • systemd-analyze show
    • systemd-analyze critical-chain
    • systemctl --failed
    • ls /etc/netplan
    • cat /etc/netplan/*
    • ip addr
    • ip route
ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2022-08-29T09:09:15.022307+00:00

What's going on here is I think that the md5 check casper is doing is using all the (minimal) amount of IO bandwidth to the virtual ISO and starving other processes from making progress. We should probably make the md5 check run under ionice or just abort itself if it's slow or something.

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2022-08-29T09:14:34.857104+00:00

Would someone be able to test a custom ISO that I can make with https://code.launchpad.net/~mwhudson/casper/+git/casper/+merge/429055 incorporated?

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-08-30T16:39:29.744108+00:00

Hi Michael,

Thank you for taking a look. Please provide the custom ISO. We will test and share findings.

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-08-30T19:46:09.376088+00:00

Please find the details:

systemd-analyze-blame-output

12min 33.112s casper-md5check.service 48.873s snapd.service 48.004s snapd.seeded.service 35.847s pollinate.service 31.663s udisks2.service 28.538s cloud-init-local.service 23.592s e2scrub_reap.service 23.281s thermald.service 22.361s dev-sr0.device 17.649s polkit.service 16.971s dev-loop9.device 16.768s networkd-dispatcher.service 16.547s snapd.hold.service 15.221s dev-loop6.device 15.003s dev-loop8.device 14.037s snapd.apparmor.service 12.682s dev-loop4.device 12.680s dev-loop5.device 12.677s dev-loop0.device 12.093s ModemManager.service 10.873s snap.lxd.activate.service 10.661s dev-loop7.device 5.310s rsyslog.service 5.231s apport.service 5.168s systemd-logind.service 4.633s secureboot-db.service 4.523s cloud-init.service 4.297s ua-timer.service 2.867s console-setup.service 2.712s systemd-resolved.service 2.689s systemd-udev-trigger.service 2.523s systemd-udevd.service 2.365s multipathd.service 2.263s systemd-user-sessions.service 1.742s keyboard-setup.service 1.590s systemd-modules-load.service 1.483s systemd-sysusers.service 1.433s snap-core20-1405.mount 1.304s snap-lxd-22923.mount 1.211s snap-snapd-15534.mount 1.198s update-notifier-download.service 950ms systemd-journal-flush.service 840ms systemd-sysctl.service 837ms setvtrgb.service 664ms systemd-random-seed.service 634ms snap-subiquity-3359.mount 606ms systemd-timesyncd.service 558ms systemd-tmpfiles-setup.service 495ms systemd-networkd.service 461ms cloud-config.service 418ms systemd-tmpfiles-setup-dev.service 357ms systemd-update-utmp.service 340ms systemd-remount-fs.service 327ms cloud-final.service 317ms sys-kernel-config.mount 315ms sys-fs-fuse-connections.mount 306ms plymouth-read-write.service 224ms plymouth-quit.service 199ms lvm2-monitor.service 181ms systemd-journald.service 117ms finalrd.service 111ms ufw.service 89ms dev-hugepages.mount 88ms dev-mqueue.mount 87ms sys-kernel-debug.mount 87ms sys-kernel-tracing.mount 83ms lvm2-pvscan@8:3.service 78ms kmod-static-nodes.service 76ms modprobe@fuse.service 76ms modprobe@drm.service 75ms modprobe@configfs.service 26ms ssh.service 20ms systemd-tmpfiles-clean.service 17ms plymouth-quit-wait.service 11ms systemd-update-utmp-runlevel.service 8ms snapd.socket 8ms systemd-networkd-wait-online.service 3ms tmp.mount 51us blk-availability.service

"systemd-analyze-critical-chain-output"

The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

graphical.target @13min 17.133s -multi-user.target @13min 17.132s -casper-md5check.service @44.019s +12min 33.112s -basic.target @44.015s -sockets.target @44.014s -snap.lxd.daemon.unix.socket @2min 8.203s -sysinit.target @43.976s -cloud-init.service @39.445s +4.523s -systemd-networkd-wait-online.service @39.432s +8ms -systemd-networkd.service @38.932s +495ms -network-pre.target @38.925s -cloud-init-local.service @10.374s +28.538s -systemd-remount-fs.service @5.507s +340ms -systemd-journald.socket @5.413s --.mount @5.377s `--.slice @5.377s

"cloud-init-analyze-blame-output"

-- Boot Record 01 -- 01.47200s (init-network/config-ssh) 00.34700s (init-local/search-NoCloud) 00.09400s (modules-config/config-set-passwords) 00.04300s (init-network/config-users-groups) 00.01600s (init-network/check-cache) 00.00400s (modules-final/config-final-message) 00.00400s (init-network/consume-user-data) 00.00200s (modules-config/config-timezone) 00.00200s (init-network/config-write-files) 00.00200s (init-network/activate-datasource) 00.00100s (modules-final/config-ssh-authkey-fingerprints) 00.00100s (modules-final/config-scripts-per-once) 00.00100s (modules-final/config-keys-to-console) 00.00100s (modules-config/config-disable-ec2-metadata) 00.00100s (init-network/config-rsyslog) 00.00100s (init-network/config-ca-certs) 00.00100s (init-network/config-bootcmd) 00.00100s (init-local/check-cache) 00.00000s (modules-final/config-scripts-user) 00.00000s (modules-final/config-phone-home) 00.00000s (modules-config/config-ssh-import-id) 00.00000s (modules-config/config-runcmd) 00.00000s (init-network/setup-datasource) 00.00000s (init-network/consume-vendor-data2) 00.00000s (init-network/consume-vendor-data)

"cloud-init-analyze-show-output"

-- Boot Record 01 -- The total time elapsed since completing an event is printed after the "@" character. The time the event takes is printed after the "+" character.

Starting stage: init-local |->no cache found @00.00300s +00.00100s |->found local data from DataSourceNoCloud @00.01200s +00.34700s Finished stage: (init-local) 01.91800 seconds

Starting stage: init-network |->restored from cache with run check: DataSourceNoCloud [seed=/var/lib/cloud/seed/nocloud][dsmode=net] @04.06200s +00.01600s |->setting up datasource @04.16500s +00.00000s |->reading and applying user-data @04.35100s +00.00400s |->reading and applying vendor-data @04.35500s +00.00000s |->reading and applying vendor-data2 @04.35500s +00.00000s |->activating datasource @05.43600s +00.00200s |->config-bootcmd ran successfully @05.44200s +00.00100s |->config-write-files ran successfully @05.44300s +00.00200s |->config-ca-certs ran successfully @05.44600s +00.00100s |->config-rsyslog ran successfully @05.44700s +00.00100s |->config-users-groups ran successfully @05.44900s +00.04300s |->config-ssh ran successfully @05.49200s +01.47200s Finished stage: (init-network) 04.19000 seconds

Starting stage: modules-config |->config-ssh-import-id ran successfully @105.80900s +00.00000s |->config-set-passwords ran successfully @105.81000s +00.09400s |->config-timezone ran successfully @105.90400s +00.00200s |->config-disable-ec2-metadata ran successfully @105.90600s +00.00100s |`->config-runcmd ran successfully @105.90700s +00.00000s Finished stage: (modules-config) 00.13400 seconds

Starting stage: modules-final |->config-scripts-per-once ran successfully @761.00000s +00.00100s |->config-scripts-user ran successfully @761.00100s +00.00000s |->config-ssh-authkey-fingerprints ran successfully @761.00100s +00.00100s |->config-keys-to-console ran successfully @761.00200s +00.00100s |->config-phone-home ran successfully @761.00300s +00.00000s |->config-final-message ran successfully @761.00300s +00.00400s Finished stage: (modules-final) 00.01100 seconds

Total Time: 6.25300 seconds

1 boot records analyzed

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-08-31T10:33:27.032168+00:00

Launchpad attachments: Installation showing failure

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-08-31T10:36:46.179183+00:00

Hi Chad/Michael,

When issue occurs, the installation stops as shown in the screen attached in comment #14. If "Close" option is selected, installation completes successfully and OS completes successfully. Does choosing "Close" option and proceeding with installation have any side effects or could it be recommended as a work around till a solution is included in the installation ISO ?

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2022-09-01T02:32:20.085064+00:00

If you could try booting the ISO from https://cdimage.ubuntu.com/ubuntu-server/daily-live/pending/ I would be very interested to hear if it is better (and if it is we can backport to 22.04).

When issue occurs, the installation stops as shown in the screen attached in comment #14. If "Close" option is selected, installation completes successfully and OS completes successfully. Does choosing "Close" option and proceeding with installation have any side effects or could it be recommended as a work around till a solution is included in the installation ISO ?

No, if the install completes successfully then there should be no deficiencies. It mostly impacts things about the installer environment, like autoinstall and ssh access to the installer.

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-09-01T08:08:44.374134+00:00

No, if the install completes successfully then there should be no deficiencies.

Thank you for clarifying.

It mostly impacts things about the installer environment, like autoinstall and ssh access to the installer.

Any thoughts on how users using autoinstall scenarios could work around this issue ?

ssh access to the installer

Any thoughts on how users requiring ssh could work around this issue ? Or should it confirmed first if this scenario an issue by trying to ssh when issue occurs ?

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-09-01T12:02:16.567295+00:00

Hi Michael,

As per your assist downloaded 64-bit PC (AMD64) server install image from "https://cdimage.ubuntu.com/ubuntu-server/daily-live/pending/" and tested it, Issue was still observed.

Please find the details:

"systemd-analyze-blame-output"

  13min 33.597s casper-md5check.service
  41.640s snapd.seeded.service
  29.360s snapd.service
  24.189s pollinate.service
  18.638s udisks2.service
  16.918s dev-sr0.device
  10.718s snapd.hold.service
  10.585s ModemManager.service
  10.085s snap.lxd.activate.service
   9.802s cloud-init-local.service
   9.375s snapd.apparmor.service
   9.059s thermald.service
   9.044s console-setup.service
   8.591s polkit.service
   7.693s e2scrub_reap.service
   7.348s systemd-logind.service
   7.220s rsyslog.service
   7.097s dev-loop8.device
   6.720s lvm2-monitor.service
   6.331s dev-loop7.device
   6.050s dev-loop6.device
   6.000s secureboot-db.service
   5.983s apport.service
   5.363s systemd-tmpfiles-setup.service
   4.766s dev-loop9.device
   4.074s dev-loop5.device
   4.073s dev-loop0.device
   4.072s dev-loop4.device
   3.766s snap-lxd-23537.mount
   3.432s multipathd.service
   3.372s snap-snapd-16292.mount
   2.970s systemd-resolved.service
   2.927s snap-core20-1611.mount
   2.376s plymouth-read-write.service
   2.234s keyboard-setup.service
   2.212s systemd-user-sessions.service
   2.040s systemd-udev-trigger.service
   1.672s finalrd.service
   1.483s systemd-sysusers.service
   1.460s cloud-init.service
   1.303s systemd-sysctl.service
   1.296s ufw.service
   1.261s systemd-journal-flush.service
   1.219s setvtrgb.service
   1.147s systemd-random-seed.service
   1.131s plymouth-quit.service
   1.103s systemd-modules-load.service
   1.032s dev-hugepages.mount
   1.030s dev-mqueue.mount
   1.029s sys-kernel-debug.mount
   1.028s sys-kernel-tracing.mount
   1.020s kmod-static-nodes.service
   1.019s systemd-journald.service
   1.014s modprobe@chromeos_pstore.service
   1.013s modprobe@configfs.service
   1.012s modprobe@drm.service
   1.010s modprobe@efi_pstore.service
   1.009s modprobe@fuse.service
   1.008s modprobe@pstore_blk.service
   1.006s modprobe@pstore_zone.service
   1.005s modprobe@ramoops.service
    801ms systemd-remount-fs.service
    712ms systemd-udevd.service
    700ms snap-subiquity-3798.mount
    558ms systemd-networkd.service
    471ms update-notifier-download.service
    378ms systemd-tmpfiles-setup-dev.service
    368ms cloud-config.service
    335ms cloud-final.service
    279ms sys-kernel-config.mount
    269ms sys-fs-fuse-connections.mount
    215ms systemd-timesyncd.service
    107ms systemd-update-utmp.service
     23ms systemd-tmpfiles-clean.service
     11ms systemd-update-utmp-runlevel.service
      9ms snapd.socket
      4ms tmp.mount
      4ms systemd-networkd-wait-online.service
      3ms plymouth-quit-wait.service
     17us blk-availability.service

"systemd-analyze-critical-chain-output"

The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

graphical.target @14min 7.154s -multi-user.target @14min 7.153s -casper-md5check.service @33.554s +13min 33.597s -basic.target @33.551s -sockets.target @33.550s -snap.lxd.user-daemon.unix.socket @1min 32.661s -sysinit.target @33.513s -cloud-init.service @32.046s +1.460s -systemd-networkd-wait-online.service @32.038s +4ms -systemd-networkd.service @31.473s +558ms -network-pre.target @31.470s -cloud-init-local.service @21.655s +9.802s -systemd-remount-fs.service @4.284s +801ms -systemd-journald.socket @4.031s --.mount @3.958s `--.slice @3.958s

"cloud-init-analyze-blame-output"

-- Boot Record 01 -- 00.54400s (init-network/config-ssh) 00.43600s (init-local/search-NoCloud) 00.04200s (init-network/config-users-groups) 00.02700s (modules-config/config-set-passwords) 00.00800s (modules-final/config-final-message) 00.00500s (init-network/check-cache) 00.00100s (modules-final/config-scripts-user) 00.00100s (modules-final/config-keys-to-console) 00.00100s (init-network/consume-user-data) 00.00100s (init-network/activate-datasource) 00.00000s (modules-final/config-ssh-authkey-fingerprints) 00.00000s (modules-final/config-scripts-per-once) 00.00000s (modules-config/config-ssh-import-id) 00.00000s (init-network/setup-datasource) 00.00000s (init-network/consume-vendor-data2) 00.00000s (init-network/consume-vendor-data) 00.00000s (init-local/check-cache)

1 boot records analyzed

"cloud-init-analyze-show-output"

-- Boot Record 01 -- The total time elapsed since completing an event is printed after the "@" character. The time the event takes is printed after the "+" character.

Starting stage: init-local |->no cache found @00.00300s +00.00000s |->found local data from DataSourceNoCloud @00.01500s +00.43600s Finished stage: (init-local) 01.57200 seconds

Starting stage: init-network |->restored from cache with run check: DataSourceNoCloud [seed=/var/lib/cloud/seed/nocloud][dsmode=net] @02.94400s +00.00500s |->setting up datasource @02.98900s +00.00000s |->reading and applying user-data @02.99800s +00.00100s |->reading and applying vendor-data @02.99900s +00.00000s |->reading and applying vendor-data2 @02.99900s +00.00000s |->activating datasource @03.02400s +00.00100s |->config-users-groups ran successfully @03.05100s +00.04200s |->config-ssh ran successfully @03.09300s +00.54400s Finished stage: (init-network) 01.16100 seconds

Starting stage: modules-config |->config-ssh-import-id ran successfully @76.25500s +00.00000s |->config-set-passwords ran successfully @76.25500s +00.02700s Finished stage: (modules-config) 00.04800 seconds

Starting stage: modules-final |->config-scripts-per-once ran successfully @817.63600s +00.00000s |->config-scripts-user ran successfully @817.63600s +00.00100s |->config-ssh-authkey-fingerprints ran successfully @817.63700s +00.00000s |->config-keys-to-console ran successfully @817.63700s +00.00100s |`->config-final-message ran successfully @817.63800s +00.00800s Finished stage: (modules-final) 00.02800 seconds

Total Time: 2.80900 seconds

1 boot records analyzed

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-09-05T17:34:50.823637+00:00

Hi Chad,

2022-08-17 06:49:32,690 DEBUG subiquity.server.server:519 waited 600.0012300014496s for cloud-init 2022-08-17 06:49:32,691 DEBUG subiquity.server.server:535 cloud-init status: '', assumed disabled

Do the logs provide any details on why cloud-init takes more than 10 minutes to complete ? Is cloud-init waiting for casper-md5check.service to complete ? casper-md5check.service seems to be taking > 10 minutes when issue occurs.

Hi Michael,

Could subiquity be made to continue without interrupting the installation and reschedule a check to verify if cloud-init completed ? Or does it need to wait for cloud-init complete before proceeding with installation ?

Also, any thoughts on query from comment #17 ?

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2022-09-05T21:51:30.129036+00:00

Hmm it's a shame those casper changes didn't help but not completely surprising.

I haven't verified this but my theory on what is going on is that the casper-md5 check is using all of the small amount of IO the virtual bmc feature provides and thus starving other services like cloud-init and snap seeding from making progress. Nothing is explicitly waiting for the casper-md5 check to complete -- if my theory is correct it's not something that can be fixed by tweaking unit dependencies.

I should have mentioned this before, but one workaround for this is to pass 'tomem' on the kernel command line. This makes the initrd copy the entire ISO into RAM before pivoting to the real rootfs which makes the first part of the boot pretty slow but should make all other parts including the md5 check much faster. It obviously increases RAM usage too but most "real" servers probably have much more RAM than is needed.

Other workarounds we are looking at is making a much smaller ISO that can be provided as a virtual ISO that will then download the main ISO from the network (as the network is probably much much faster than the virtual ISO). But it would be nice to fix this properly, probably by having the casper md5 check abort if it detects that it is only making very slow progress.

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-09-07T20:56:08.213790+00:00

Narendra K (knarendra),

As Michael mentioned, this really looks more like a system that is under load and not responding well to all the systemd services that are being run during the install process.

The time values I'm seeing in comment #18 from systemd-analyze are 10 times greater than what I typically see on laptops, containers, VMs and bare metal.

From the systemd-analyze blame output from comment in #18, I can confirm the reason for cloud-config.service "waiting" is the sum of the systemd snap*services setup time:

  41.640s snapd.seeded.service
  29.360s snapd.service

... 16.547s snapd.hold.service

Those values account for the minute and a half of waiting... which aligns with the attached cloud-init analyze blame output as well:

starting stage: modules-config |`->config-ssh-import-id ran successfully @76.25500s +00.00000s

Which tells us that cloud-init's modules-config stage started at 76 seconds after boot began (around the same amount as the combined snap services).

When looking as cloud-init analyze output we see the cloud-final stage doesn't start until 8817 seconds (13 min) after boot which is greater than subiquity's 10 minute timeout.

Starting stage: modules-final |`->config-scripts-per-once ran successfully @817.63600s +00.00000s

/lib/systemd/system/cloud-final.service config is only started once the following condition is met:

After=multi-user.target

And multi-user.target depends on that casper-md5check.service which is taking 13 minutes for one reason of another. We can confirm what's slowing up that final stage of cloud-init with the following command on your system: $ systemd-analyze critical-chain cloud-final.service

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-09-07T21:19:31.593901+00:00

Ultimately, each md5 file md5 comparison seems to take around 1/2 second on this system. If we are unable to determine what is slowing down the casper-md5check or an early exit in the event of painful md5checks, can we either provide a config option to disable the casper md5 check, or make that cloud-init timeout configurable to > 600 seconds[1]?

References for cloud-init timeout: https://github.com/canonical/subiquity/blob/main/subiquity/server/server.py#L523

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2022-09-08T00:41:41.695107+00:00

You can also put fsck.mode=skip on the kernel command line to get the md5check to skip itself, I should have mentioned that too!

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-09-08T10:29:32.574970+00:00

Hi Michael,

Tried passing kernel work around parameter "tomem" as suggested in the comment #20, Issue was still observed in this case. And also Tried passing kernel work around parameter "fsck.mode=skip" as suggested in the comment #23, Issue was not seen in this scenario.

Also, we tried reproducing the Issue in 'Ubuntu 20.04.4' in this case Issue was not seen.

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-09-08T10:54:26.492325+00:00

Chad Smith,

Please find the details as requested in comment #21.

"systemd-analyze critical-chain cloud-final.service output"

The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

cloud-final.service +266ms -multi-user.target @12min 38.156s -casper-md5check.service @39.106s +11min 59.049s -basic.target @39.104s -sockets.target @39.103s -snap.lxd.user-daemon.unix.socket @1min 38.160s -sysinit.target @39.068s -cloud-init.service @35.645s +3.417s -systemd-networkd-wait-online.service @35.634s +8ms -systemd-networkd.service @35.265s +365ms -network-pre.target @35.262s -cloud-init-local.service @12.178s +23.078s -systemd-remount-fs.service @6.689s +483ms -systemd-journald.socket @6.401s --.mount @6.360s `--.slice @6.360s

ubuntu-server-builder commented 1 year ago

Launchpad user Chad Smith(chad.smith) wrote on 2022-09-08T21:09:20.324471+00:00

Shubhakar Gowda P S (shubhakara)

  1. thank you for the specific critical-chain of cloud-final.service. It does confirm the main thing blocking cloud-init from completing is the casper-md5check.service on this system. Without that service delaying the multi-user.target, cloud-final,service can complete in around 50 seconds which beats that 10 minute timeout in the installer.

  2. Also, we tried reproducing the Issue in 'Ubuntu 20.04.4' in this case Issue was not seen. Do you mean that stock 20.04.4 images without any kernel command-line options did not reproduce this issue on the same system type?

I did notice in running the daily server live image for 20.04 today that in the console output it shows me the fsck.mode=skip So that is probably why this didn't hit you on the 20.04 install.

Thanks for the kernel cmdline tip Michael, I didn't know about fsck.mode=skip and seems like that would avoid this issue on platforms which are being hit with general IO/performance concerns leading to multi-user.target not being reached for an extended period of time.

Maybe no need to add a config setting for the cloud-init --wait timeout in subiquity as the other 'workaround' seems to work here.

cloud-init squad should be touching the cloud-init status --wait call https://github.com/canonical/subiquity/blob/main/subiquity/server/server.py#L521 to introspect cloud-init status a bit better once cloud-init releases support for cloud-init status --format json. In the event of a timeout, we can also make sure we touch this section to ensure a timeout results in a report of what stage cloud-init is 'stuck' and give suggested debug steps for bug filing.

ubuntu-server-builder commented 1 year ago

Launchpad user Shubhakar Gowda P S(shubhakara) wrote on 2022-09-09T11:32:23.310690+00:00

Chad Smith,

Once After Ubuntu 20.04.4 CD Image Installation got completed, by switching over to a shell using 'Ctrl-Z + F2' keyboard shortcut key we have collected the output of following details.

output "systemd-analyze critical-chain cloud-final.service"

The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

cloud-final.service +475ms -cloud-config.service @22.760s +473ms -snapd.seeded.service @12.689s +10.067s -basic.target @9.526s -sockets.target @9.526s -snap.lxd.daemon.unix.socket @14.228s -sysinit.target @9.512s -cloud-init.service @8.350s +1.161s -systemd-networkd-wait-online.service @8.341s +6ms -systemd-journald.socket @1.304s -system.slice @1.300s `--.slice @1.300s

output "systemctl status casper-md5check.service"

Unit casper-md5check.service could not be found

output "/var/log/casper.log"

stdin: Invalid argument stdin: Invalid argument stdin: Invalid argument /init: line 49: can't open /dev/sda: No medium found /init: line 49: can't open /dev/sda: No medium found passwd: password expiry information changed. Using CD-ROM mount point /cdrom/ Identifying... [bdb176fb941b5200237520bff1ab61c3-2] Scanning disc for index files... Found 2 package indexes, 0 source indexes, 0 translation indexes and 1 signatures Found label 'Ubuntu-Server 20.04.4 LTS Focal Fossa - Release amd64 (20220223.1)' This disc is called: 'Ubuntu-Server 20.04.4 LTS Focal Fossa - Release amd64 (20220223.1)' Copying package lists...gpgv: Signature made Wed Feb 23 09:26:53 2022 UTC gpgv: using RSA key 843938DF228D22F7B3742BC0D94AA3F0EFE21092 gpgv: Good signature from "Ubuntu CD Image Automatic Signing Key (2012) cdimage@ubuntu.com" Reading Package Indexes... 0% Reading Package Indexes... Done Writing new source list Source list entries for this disc are: deb cdrom:[Ubuntu-Server 20.04.4 LTS Focal Fossa - Release amd64 (20220223.1)]/ focal main restricted Repeat this process for the rest of the CDs in your set.

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-09-16T16:54:35.729545+00:00

Hi Chad/Michael,

Any updates or feedback on findings in comment #27 ?

It seems like in Ubuntu 20.04,

a) cloud-init does not depend on multi-user.target b) casper-md5check.service is not present

Could Ubuntu 22.04 installation be modified to match with Ubuntu 20.04 ? Any thoughts ?

Hi Michael,

Could 'fsck.mode=skip' be documented in Ubuntu 22.04 release notes ?

ubuntu-server-builder commented 1 year ago

Launchpad user Brett Holman(holmanb) wrote on 2022-09-16T19:09:00.896674+00:00

Since the issue has been narrowed down to casper-md5check, which is out of scope of cloud-init, I'm marking the cloud-init side of this bug as "Invalid", since there is nothing actionable to do to in cloud-init to resolve this bug, and the plans to increase debugging transparency that Chad mentioned were already planned and scheduled.

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-09-21T09:51:56.971840+00:00

Hi Michael,

Any updates on the queries in comment #28 ?

It seems like in Ubuntu 20.04,

a) cloud-init does not depend on multi-user.target b) casper-md5check.service is not present

Could Ubuntu 22.04 installation be modified to match with Ubuntu 20.04 ? Any thoughts ?

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Bungert(dbungert) wrote on 2022-11-01T08:56:32.150866+00:00

We propose no longer having the md5check block multi-user. I'm opening a matching MP.

ubuntu-server-builder commented 1 year ago

Launchpad user Narendra K(knarendra) wrote on 2022-11-16T15:20:30.614960+00:00

Hi Dan,

Would the patch get included in Ubuntu 22.04.2 ?

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Bungert(dbungert) wrote on 2022-11-16T15:38:12.218127+00:00

Hi Narendra - yes, the plan is to have this done in time for 22.04.2. Thanks for specifically pointing this out, we need to make sure it's included in the Jammy Stable Release Updates.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2022-11-27T04:37:22.218023+00:00

This bug was fixed in the package casper - 1.476


casper (1.476) lunar; urgency=medium

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Bungert(dbungert) wrote on 2022-11-29T15:15:40.263866+00:00

Thanks for moving the SRU forward Mauricio.

I see you nominated this fix for Kinetic, I think it won't help much as we probably won't be creating new Kinetic install media. Or do you have something else in mind?

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2022-11-29T16:22:55.481845+00:00

Hi Dan o/ Yes, it'd be just for documentation purposes for the SRU team's review stage. I'll mark it as Won't Fix with the rationale described, in the SRU template.

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-23T20:59:45.645722+00:00

Reproducer/Verification for SRU to Jammy, based on strace delay injection on read().

Uploading to jammy. Attaching debdiff for reference.

...

Launch a VM with the Jammy daily live server:

$ wget https://cdimage.ubuntu.com/ubuntu-server/jammy/daily-live/current/jammy-live-server-amd64.iso

$ ISO=jammy-live-server-amd64.iso
$ VM=casper-jammy
$ virt-install  --name $VM --cdrom $ISO --vcpus 2 --memory 2048 --disk none --osinfo ubuntu-stable-latest 

Press e to edit, append "break=init console=ttyS0", press ctrl-x, close window.

Open the serial console and chroot:

$ virsh console $VM
...
(initramfs) chroot /root /bin/bash

Test strace delay injection:

# time strace --trace read cat /dev/null
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\3206\2\0\0\0\0\0"..., 832) = 832
read(3, "", 131072)                     = 0
+++ exited with 0 +++

real    0m0.041s
user    0m0.009s
sys 0m0.030s

# time strace --trace read --inject read:delay_enter=5s cat /dev/null
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\3206\2\0\0\0\0\0"..., 832) = 832 (DELAYED)
read(3, "", 131072)                     = 0 (DELAYED)
+++ exited with 0 +++

real    0m10.041s
user    0m0.010s
sys 0m0.033s

Modify the casper-md5check service:

# sed -i '/^ExecStart=/ s,=,=/usr/bin/strace --inject read:delay_enter=60s ,' /usr/lib/systemd/system/casper-md5check.service

# cat /usr/lib/systemd/system/casper-md5check.service
[Unit]
Description=casper-md5check Verify Live ISO checksums

[Service]
Type=oneshot
ExecStart=/usr/bin/strace --inject read:delay_enter=60s /usr/lib/casper/casper-md5check /cdrom /cdrom/md5sum.txt
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Exit and wait; the issue happens:

# exit
(initramfs) exit

...

Ubuntu 22.04.1 ubuntu-server ttyS0

connecting...
waiting for cloud-init... /

<10 minutes later>

================================================================================
  Serial                                                              [ Help ]
================================================================================

  As the installer is running on a serial console, it has started in basic
  mode, using only the ASCII character set and black and white colours.

   ┌────────────────────────────────────────────────────────────────────────┐
   │                                                                        │
   │  cloud-init failed to complete after 10 minutes of waiting. This       │
   │  suggests a bug, which we would appreciate help understanding.  If     │
   │  you could file a bug at                                               │
   │  https://bugs.launchpad.net/subiquity/+filebug and attach the          │
   │  contents of /var/log, it would be most appreciated.                   │
   │                                                                        │
   │                         [ Switch to a shell ]                          │
   │                         [ Close             ]                          │
   │                                                                        │
   └────────────────────────────────────────────────────────────────────────┘

                      [ Continue in rich mode  > ]
                      [ Continue in basic mode > ]

...

Similarly, repeat, this time with test packages from ppa:mfo/lp1986781:

...

Open the serial console and chroot:

$ virsh console $VM
...
(initramfs) chroot /root /bin/bash

Install the test package:

# dhclient enp1s0
# wget https://launchpad.net/~mfo/+archive/ubuntu/lp1986781/+build/25512466/+files/casper_1.470.2_amd64.deb
# dpkg -i casper_*.deb

Modify the casper-md5check service:

# sed -i '/^ExecStart=/ s,=,=/usr/bin/strace --inject read:delay_enter=60s ,' /usr/lib/systemd/system/casper-md5check.service

# cat /usr/lib/systemd/system/casper-md5check.service
Description=casper-md5check Verify Live ISO checksums
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/strace --inject read:delay_enter=60s /usr/lib/casper/casper-md5check /cdrom /cdrom/md5sum.txt
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Exit and wait; the issue does not happen anymore:

# exit
(initramfs) exit

...

Ubuntu 22.04.1 ubuntu-server ttyS0

connecting...
waiting for cloud-init... /

<some seconds later>

================================================================================
  Serial                                                              [ Help ]
================================================================================

  As the installer is running on a serial console, it has started in basic
  mode, using only the ASCII character set and black and white colours.

  If you are connecting from a terminal emulator such as gnome-terminal that
  supports unicode and rich colours you can switch to "rich mode" which uses
  unicode, colours and supports many languages.

  You can also connect to the installer over the network via SSH, which will
  allow use of rich mode.

                      [ Continue in rich mode  > ]
                      [ Continue in basic mode > ]
                      [ View SSH instructions    ]

Checking the casper-md5sum service is still running:

Help > Enter shell.

# systemctl status casper-md5check.service --no-pager | grep 'Active:'
     Active: activating (start) since Mon 2023-01-23 18:56:50 UTC; 3min 26s ago

And it should not be a problem, as its start timeout is not limited.

# systemctl show casper-md5check.service | grep -i timeout
TimeoutStartUSec=infinity
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
TimeoutCleanUSec=infinity
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none

Quit, cleanup the VM.

Press ctrl-]

$ virsh destroy $VM
$ virsh undefine $VM
ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-23T21:01:19.241010+00:00

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-23T21:03:22.617136+00:00

cosmetic glitch (see description's regression potential section). Launchpad attachments: Screenshot from 2023-01-23 13-08-33.png

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-23T21:07:35.250446+00:00

For documentation purposes,

The critical chain of systemd units/time before/after the fix, with a fake 'sleep 30' as the command in casper-md5check.service.

Before:

root@ubuntu-server:/# systemd-analyze critical-chain cloud-final.service The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

cloud-final.service +492ms -multi-user.target @37.968s -casper-md5check.service @7.954s +30.010s -basic.target @7.947s -sockets.target @7.946s -snap.lxd.user-daemon.unix.socket @15.740s -sysinit.target @7.892s -cloud-init.service @5.854s +2.032s -systemd-networkd-wait-online.service @5.835s +6ms -systemd-networkd.service @5.713s +119ms -network-pre.target @5.709s -cloud-init-local.service @1.943s +3.765s -systemd-remount-fs.service @1.172s +130ms -systemd-journald.socket @988ms --.mount @942ms `--.slice @941ms

After:

root@ubuntu-server:/# systemd-analyze critical-chain cloud-final.service The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

cloud-final.service +901ms -multi-user.target @21.831s -ssh.service @21.712s +38ms -basic.target @8.255s -sockets.target @8.254s -snap.lxd.user-daemon.unix.socket @17.452s -sysinit.target @8.202s -cloud-init.service @6.970s +1.225s -systemd-networkd-wait-online.service @6.958s +9ms -systemd-networkd.service @6.842s +113ms -network-pre.target @6.838s -cloud-init-local.service @1.938s +4.898s -systemd-remount-fs.service @1.160s +174ms -systemd-journald.socket @941ms -system.slice @894ms `--.slice @894ms

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-23T21:08:21.083037+00:00

Launchpad attachments: lp1986781-casper-jammy.debdiff

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-27T18:38:10.361167+00:00

Re-uploading to Jammy with changelog updates suggested by Timo.

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-27T18:38:33.809893+00:00

Launchpad attachments: lp1986781-casper-jammy-v2.debdiff

ubuntu-server-builder commented 1 year ago

Launchpad user Timo Aaltonen(tjaalton) wrote on 2023-01-28T10:54:36.959899+00:00

Hello Shubhakar, or anyone else affected,

Accepted casper into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/casper/1.470.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To properly test it you will need to obtain and boot a daily build of a Live CD for jammy. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-30T14:00:25.633007+00:00

Verification done for jammy-proposed.

...

Booting jammy-live-server-amd64.iso with break=init (more details in comment #37).

(initramfs) chroot /root /bin/bash root@ubuntu-server:/# dhclient enp1s0

root@ubuntu-server:/# add-apt-repository -y -p proposed

root@ubuntu-server:/# apt install -y casper/jammy-proposed

root@ubuntu-server:/# apt policy casper casper: Installed: 1.470.2 Candidate: 1.470.2 Version table: *** 1.470.2 500 500 http://security.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages 500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages 100 /var/lib/dpkg/status ... root@ubuntu-server:/# add-apt-repository -y -p proposed -r

...

root@ubuntu-server:/# sed -i '/^ExecStart=/ s,=,=/usr/bin/strace --inject read:delay_enter=60s ,' /usr/lib/systemd/system/casper-md5check.service

root@ubuntu-server:/# cat /usr/lib/systemd/system/casper-md5check.service [Unit] Description=casper-md5check Verify Live ISO checksums After=multi-user.target

[Service] Type=oneshot ExecStart=/usr/bin/strace --inject read:delay_enter=60s /usr/lib/casper/casper-md5check /cdrom /cdrom/md5sum.txt ...

root@ubuntu-server:/# exit exit (initramfs) exit ...

... [ OK ] Reached target Multi-User System. Starting casper-md5check Verify Live ISO checksums... Starting Execute cloud user/final scripts... ... [ OK ] Finished Execute cloud user/final scripts. [ OK ] Reached target Cloud-init target.

Ubuntu 22.04.1 LTS ubuntu-server ttyS0

connecting...

================================================================================ Serial [ Help ]

As the installer is running on a serial console, it has started in basic mode, using only the ASCII character set and black and white colours.

If you are connecting from a terminal emulator such as gnome-terminal that supports unicode and rich colours you can switch to "rich mode" which uses unicode, colours and supports many languages.

You can also connect to the installer over the network via SSH, which will allow use of rich mode.

                      [ Continue in rich mode  > ]
                      [ Continue in basic mode > ]
                      [ View SSH instructions    ]
ubuntu-server-builder commented 1 year ago

Launchpad user Mike Rushton(leftyfb) wrote on 2023-01-31T16:29:25.780022+00:00

"To properly test it you will need to obtain and boot a daily build of a Live CD for jammy"

If that is all that is needed then it is failing for me. If, however, we need to stop the installer, enable the proposed repository, update casper, disable the proposed repository and then continue the installer, then it looks like that has been validated above by Mauricio.

ubuntu-server-builder commented 1 year ago

Launchpad user Mauricio Faria de Oliveira(mfo) wrote on 2023-01-31T17:13:15.565522+00:00

Hi Mike,

Right; the package version in -proposed isn't in the daily build yet, so it needed more manual steps, but once this makes it to -updates (or if a daily build is built w/ -proposed enabled), then it'll be in.

@ https://cdimage.ubuntu.com/ubuntu-server/jammy/daily-live/20230131/jammy-live-server-amd64.manifest

casper 1.470.1

cheers, Mauricio

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Bungert(dbungert) wrote on 2023-02-02T15:23:38.048915+00:00

Verification done for jammy-proposed.

I have purposefully done a different test procedure, based on injecting the proposed casper version and using systemd-analyze.

Above, in comments such as https://bugs.launchpad.net/cloud-init/+bug/1986781/comments/25, we can see that casper-md5check needs to complete before hitting multi-user.target.

With casper 1.470.2 in place, it looks like:

The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character.

casper-md5check.service +13.837s └─multi-user.target @18.459s └─ssh.service @18.336s +30ms └─basic.target @7.833s └─sockets.target @7.831s └─snap.lxd.user-daemon.unix.socket @15.746s └─sysinit.target @7.778s └─cloud-init.service @6.080s +1.695s └─systemd-networkd-wait-online.service @6.061s +10ms └─systemd-networkd.service @5.950s +109ms └─network-pre.target @5.948s └─cloud-init-local.service @1.615s +4.332s └─systemd-remount-fs.service @884ms +135ms └─systemd-journald.socket @705ms └─system.slice @624ms └─-.slice @624ms

This is the expected behavior, so the SRU LGTM.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2023-02-07T11:05:44.814349+00:00

This bug was fixed in the package casper - 1.470.2


casper (1.470.2) jammy; urgency=medium

[ Dan Bungert ]

ubuntu-server-builder commented 1 year ago

Launchpad user Łukasz Zemczak(sil2100) wrote on 2023-02-07T11:05:49.907084+00:00

The verification of the Stable Release Update for casper has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.