flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
681 stars 30 forks source link

Boot time for ISO image `3975.2.0` release has increased to ~1m30s #1514

Open AverageMarcus opened 1 month ago

AverageMarcus commented 1 month ago

Description

The latest stable release, 3975.2.0, is taking a considerably longer time to boot compared to previous releases.

E.g.

core@localhost ~ $ systemd-analyze
Startup finished in 1.787s (kernel) + 1min 31.064s (initrd) + 4.103s (userspace) = 1min 36.956s
multi-user.target reached after 4.103s in userspace.

vs. the previous release:

Startup finished in 1.309s (kernel) + 2.089s (initrd) + 4.278s (userspace) = 7.677s
multi-user.target reached after 4.254s in userspace.

This has resulted in a delay at boot of roughly 1min 30sec compared to the previous release.

Impact

When building images with image-builder the default boot wait time is 60s after which Packer will start typing out the boot command. As this happens before the OS has finished booting the flatcar-install script isn't actually run correctly and can result in the OS not being correctly set up in an environment such as VSphere (it doesn't install the vmtools for example). With image-builder this then results in it hanging at Waiting for IP... as vcenter is unable to retrieve the IP from the guest OS as no vmtools have been installed.

Environment and steps to reproduce

  1. Set-up: Flatcar 3975.2.0 on VSphere via image-builder
  2. Task: Packer stuck trying to get the IP address of the VM
  3. Action(s): a. With image-builder run: make build-node-ova-vsphere-flatcar b. The 3975.2.0 image is used to launch a new VM (https://stable.release.flatcar-linux.net/amd64-usr/3975.2.0/flatcar_production_iso_image.iso) c. The VM boots but vcenter fails to retrieve the IP address due to lack of vmtools on the guest OS
  4. Error: Waiting for IP...

Expected behavior

The OS boots in under 60 seconds. Ideally back down below 10s like the previous release.

Additional information

This issue was previously about open-vm-tools not being installed but after debugging with @tormath1 on Slack (see thread) we learnt that it was actually due to the delay in boot that was causing the problem so I updated the issue to reflect that.

tormath1 commented 1 month ago

I just booted an instance on VSphere and the open-vm-tools and friends are available:

Flatcar Container Linux by Kinvolk stable 3975.2.0 for VMware
core@localhost ~ $ systemd-sysext status
HIERARCHY EXTENSIONS         SINCE
/opt      none               -
/usr      containerd-flatcar Fri 2024-08-09 13:41:10 UTC
          docker-flatcar
          oem-vmware
core@localhost ~ $ vmtoolsd --version
VMware Tools daemon, version 12.4.0.48309 (build-23259341)

The ISO image is a generic image (without OEM tools), so that's normal if there is no sign of OEM VMWare tools.

I need to investigate on how the image-builder is building Flatcar VSphere image, I guess it boots the Flatcar ISO and from the ISO it runs the flatcar-install script (similar to OpenStack).

AverageMarcus commented 1 month ago

Here's the Packer vars that image-builder is using when building Flatcar for VSphere - https://github.com/kubernetes-sigs/image-builder/blob/main/images/capi/packer/ova/flatcar.json

AverageMarcus commented 1 month ago

After some debugging on Slack we discovered that the boot time for 3975.2.0 has slowed down quite considerably to roughly 1m30s which is longer than the 60s that image-builder is configured to wait by default. Updating the boot_wait time to 5min allowed me to workaround this problem but a 1m30s boot time isn't ideal so I'm going to update / change this issue to focus on investigating that regression.

tormath1 commented 1 month ago

Some logs on local ISO with qemu:

$ wget https://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_iso_image.iso
$ wget https://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_iso.sh
$ ./flatcar_production_iso.sh -- -display curses
...
Aug 09 14:25:14 localhost systemd[1]: dev-disk-by\x2dpartlabel-ROOT.device: Job dev-disk-by\x2dpartlabel-ROOT.device/start timed out.
Aug 09 14:25:14 localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dpartlabel-ROOT.device - /dev/disk/by-partlabel/ROOT.
Aug 09 14:25:14 localhost systemd[1]: dev-disk-by\x2dpartlabel-ROOT.device: Job dev-disk-by\x2dpartlabel-ROOT.device/start failed with result 'timeout'.

It did not investigated further.

jepio commented 1 month ago

alpha-3913.0.0 boots ok, this issue first occurred with alpha 3941.0.0.

jepio commented 1 month ago

This is the guilty file: https://github.com/flatcar/bootengine/blob/flatcar-master/dracut/31decrypt-root/decrypt-root.service and this also affect ipxe/pxe boot, but only when no ignition file is supplied.

ader1990 commented 1 month ago

Maybe use ConditionKernelCommandLine=!flatcar.oem.id=pxe for pxe? But for ISO, I did not see any kernel cmdline that can be used to do not run the service.

ader1990 commented 1 month ago

Maybe use ConditionKernelCommandLine=!flatcar.oem.id=pxe for pxe? But for ISO, I did not see any kernel cmdline that can be used to do not run the service.

which means that if there is not flatcar.oem.id parameter, maybe we can skip it. Still, a better and sounder approach should be used.

Leaving here the cmdline in case of normal boot on qemu:

rootflags=rw mount.usrflags=ro BOOT_IMAGE=/flatcar/vmlinuz-a 
 mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132
 rootflags=rw mount.usrflags=ro
 consoleblank=0 root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 flatcar.oem.id=qemu flatcar.autologin 
 verity.usrhash=2d5fc89636c5bc9515e5e98dfdf370f69a0c963b25db9a883bb16a221845a3e0

For ISO:

rootflags=rw mount.usrflags=ro initrd=/flatcar/cpio.gz flatcar.autologin BOOT_IMAGE=/flatcar/vmlinuz
AverageMarcus commented 1 month ago

FYI - I encountered another issue with this release and image-builder after increasing the boot timeout. It seems like the boot command configured in image-builder is now failing because the first command is returning a warning that seems to be causing the c of the curl command to be lost.

Flatcar

chewi commented 4 weeks ago

I have reproduced this on 4054.0.0 with QEMU, so it hasn't been magically fixed since. I'll keep investigating.

tormath1 commented 4 weeks ago

Since the ISO configuration is identical to qemu, it has the same OEM ID - we can't really set condition on this. As it timeouts on getting the ROOT device (which does not exist on ISO):

Aug 09 14:25:14 localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dpartlabel-ROOT.device - /dev/disk/by-partlabel/ROOT.
Aug 09 14:25:14 localhost systemd[1]: dev-disk-by\x2dpartlabel-ROOT.device: Job dev-disk-by\x2dpartlabel-ROOT.device/start failed with result 'timeout'.
$ ls /dev/disk/by-label/
CDROM

I would add a condition to the unit (https://github.com/flatcar/bootengine/blob/flatcar-master/dracut/31decrypt-root/decrypt-root.service) to check for CDROM:

ConditionKernelCommandLine=!flatcar.oem.id=pxe
ConditionPathExists=!/dev/disk/by-label/CDROM

I'm starting to think that some Equinix Metal failures (PXE boot) might be explained by this timeout.

chewi commented 4 weeks ago

I've now set this up in a way that I can very quickly test changes. Dropping that unit entirely does indeed fix the problem.

Our other Dracut modules check for this situation with this. Perhaps it would be best to do something similar here?

usr=$(cmdline_arg mount.usr)
if [[ -z "${usr}" && -f /usr.squashfs ]]; then
chewi commented 4 weeks ago

This works:

ConditionKernelCommandLine=!mount.usr
ConditionPathExists=!/usr.squashfs

But only if I remove this line:

Wants=dev-disk-by\x2dpartlabel-ROOT.device

I think that device is dynamic, so I don't know whether removing that is a good idea.

chewi commented 4 weeks ago

Changing the Wants to Requisite is enough to make this work without even adding the other conditions. I just need to check that it still activates when actually needed.

chewi commented 3 weeks ago

Bleh, it doesn't seem to work. If I drop to a shell before it switches root, I find that it hasn't decrypted the root partition, but doing it manually works. It works automatically if I change it back to Wants.

That does make sense. The by-partlabel symlink takes time to appear, so we want this to wait for that, but only if we're fairly sure it will appear eventually. I'm having a similar issue with the other boot stuff I'm looking at, so I will keep experimenting.

chewi commented 3 weeks ago

I've been doing some more thinking. I don't think we should assume that a user would not want to decrypt root when booting from the ISO or via PXE. Serving Flatcar via PXE but storing data on local encrypted disks is a key use case, right?

The point of this decrypt-root service is to make the decryption magically work without explicit configuration. Either we have to wait a reasonable amount of time for the root device to show up, or we have to make it work more reactively via a udev rule.

The question is whether this decrypted disk needs to be ready before some other boot time component. Do we do anything special with these "root" disks? What happens if it's not encrypted? If some other part of the boot process depends on this, doesn't that also need to wait for the symlink to appear?

jepio commented 3 weeks ago

There may be a usecase that uses an encrypted root disk with PXE, but then there would need to be a root= flag on the kernel command line so that we know where the root (stateful) partition is located. So in that sense there are three cases:

a) no root= argument and not (PXE or ISO): wait for PARTLABEL=ROOT and try to decrypt b) no root= argument and (PXE or ISO): don't wait for disk c) root= argument and (PXE or ISO): wait for root= disk and try to decrypt.

b) is what we want to fix here. a) works. c) is not supported (because of no root= handling) but is also not terribly interesting at this time.

The question is whether this decrypted disk needs to be ready before some other boot time component. Do we do anything special with these "root" disks? What happens if it's not encrypted? If some other part of the boot process depends on this, doesn't that also need to wait for the symlink to appear?

Decrypt-root runs from the initrd, and we need to have the decrypted root mounted at /sysroot before we can proceed with switch-root. So yes, for non-PXE, non-ISO boot we need to wait for PARTLABEL=ROOT for sure.

There is the added fun that decrypt-root.service only runs when ignition isn't provided. The first boot (ignition) has different services and ordering becacuse that's the boot that takes care of creating the encrypted root partition.

chewi commented 3 weeks ago

I wanted to allow for (c), and I had an idea, so I reworked this around a udev rule.

ACTION=="add", SUBSYSTEM=="block", ENV{ID_PART_ENTRY_NAME}=="ROOT", ENV{ID_FS_TYPE}=="crypto_LUKS", TAG+="systemd", ENV{SYSTEMD_WANTS}+="decrypt-root.service"

I also made decrypt-root.service a bit simpler. I haven't tried (a) yet, but (b) and (c) work really well. This also decrypts root even if you don't actually specify root=/dev/mapper/rootencrypted. If you do specify it, then systemd knows to wait. I'll make a PR tomorrow.

jepio commented 3 weeks ago

how does the udev rule interact with ignition? for case a) and c) the encryption would be done by ignition so, decrypt-root.service would race with ignitions actions?


From: James Le Cuirot @.> Sent: Monday, August 19, 2024 7:19:47 PM To: flatcar/Flatcar @.> Cc: Comment @.>; Subscribed @.> Subject: Re: [flatcar/Flatcar] Boot time for ISO image 3975.2.0 release has increased to ~1m30s (Issue #1514)

I wanted to allow for (c), and I had an idea, so I reworked this around a udev rule.

ACTION=="add", SUBSYSTEM=="block", ENV{ID_PART_ENTRY_NAME}=="ROOT", ENV{ID_FS_TYPE}=="crypto_LUKS", TAG+="systemd", ENV{SYSTEMD_WANTS}+="decrypt-root.service"

I also made decrypt-root.service a bit simpler. I haven't tried (a) yet, but (b) and (c) work really well. This also decrypts root even if you don't actually specify root=/dev/mapper/rootencrypted. If you do specify it, then systemd knows to wait. I'll make a PR tomorrow.

— Reply to this email directly, view it on GitHubhttps://github.com/flatcar/Flatcar/issues/1514#issuecomment-2297060358 or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABXINVWHOK3TRWQN243CU3TZSISLHBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVE4DKNRYGIYTQMJSURXGC3LFVFUGC427NRQWEZLMVRZXKYTKMVRXIX3UPFYGLLCJONZXKZKDN5WW2ZLOOSTHI33QNFRXHE4CUR2HS4DFVJZGK4DPONUXI33SPGSXMYLMOVS2SMJSGM4TIOBRGQ2YFJDUPFYGLJLJONZXKZNFOZQWY5LFVIZDINJXGYYTMMBQGGBKI5DZOBS2K3DBMJSWZJLWMFWHKZNJHA2TMOBSGE4DCMVHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

chewi commented 3 weeks ago

Good news all round after more testing.

The rule works correctly on first boot and subsequently for (a) and (c) because of ENV{ID_FS_TYPE}=="crypto_LUKS". Even if ROOT already exists, the rule won't kick in until it's recreated as LUKS.

I also tried with flatcar.first_boot=1. This initially failed, as it decrypted ROOT, which was then wiped by Ignition. Adding After=ignition-disks.service fixed this, and now it wipes ROOT and decrypts it reliably every time.

In theory, this could be used to automatically decrypt any LUKS partitions but that's probably taking it too far.

chewi commented 3 weeks ago

Sorry, this was closed a little prematurely. I'll close it when the fix is in the stable release.

chewi commented 2 weeks ago

I've backported this to the 4012 Beta.