Open AverageMarcus opened 1 month ago
I just booted an instance on VSphere and the open-vm-tools
and friends are available:
Flatcar Container Linux by Kinvolk stable 3975.2.0 for VMware
core@localhost ~ $ systemd-sysext status
HIERARCHY EXTENSIONS SINCE
/opt none -
/usr containerd-flatcar Fri 2024-08-09 13:41:10 UTC
docker-flatcar
oem-vmware
core@localhost ~ $ vmtoolsd --version
VMware Tools daemon, version 12.4.0.48309 (build-23259341)
The ISO image is a generic image (without OEM tools), so that's normal if there is no sign of OEM VMWare tools.
I need to investigate on how the image-builder is building Flatcar VSphere image, I guess it boots the Flatcar ISO and from the ISO it runs the flatcar-install
script (similar to OpenStack).
Here's the Packer vars that image-builder is using when building Flatcar for VSphere - https://github.com/kubernetes-sigs/image-builder/blob/main/images/capi/packer/ova/flatcar.json
After some debugging on Slack we discovered that the boot time for 3975.2.0
has slowed down quite considerably to roughly 1m30s
which is longer than the 60s
that image-builder is configured to wait by default. Updating the boot_wait time to 5min allowed me to workaround this problem but a 1m30s boot time isn't ideal so I'm going to update / change this issue to focus on investigating that regression.
Some logs on local ISO with qemu:
$ wget https://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_iso_image.iso
$ wget https://stable.release.flatcar-linux.net/amd64-usr/current/flatcar_production_iso.sh
$ ./flatcar_production_iso.sh -- -display curses
...
Aug 09 14:25:14 localhost systemd[1]: dev-disk-by\x2dpartlabel-ROOT.device: Job dev-disk-by\x2dpartlabel-ROOT.device/start timed out.
Aug 09 14:25:14 localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dpartlabel-ROOT.device - /dev/disk/by-partlabel/ROOT.
Aug 09 14:25:14 localhost systemd[1]: dev-disk-by\x2dpartlabel-ROOT.device: Job dev-disk-by\x2dpartlabel-ROOT.device/start failed with result 'timeout'.
It did not investigated further.
alpha-3913.0.0 boots ok, this issue first occurred with alpha 3941.0.0.
This is the guilty file: https://github.com/flatcar/bootengine/blob/flatcar-master/dracut/31decrypt-root/decrypt-root.service and this also affect ipxe/pxe boot, but only when no ignition file is supplied.
Maybe use ConditionKernelCommandLine=!flatcar.oem.id=pxe
for pxe? But for ISO, I did not see any kernel cmdline that can be used to do not run the service.
Maybe use
ConditionKernelCommandLine=!flatcar.oem.id=pxe
for pxe? But for ISO, I did not see any kernel cmdline that can be used to do not run the service.
which means that if there is not flatcar.oem.id parameter, maybe we can skip it. Still, a better and sounder approach should be used.
Leaving here the cmdline in case of normal boot on qemu:
rootflags=rw mount.usrflags=ro BOOT_IMAGE=/flatcar/vmlinuz-a
mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132
rootflags=rw mount.usrflags=ro
consoleblank=0 root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 flatcar.oem.id=qemu flatcar.autologin
verity.usrhash=2d5fc89636c5bc9515e5e98dfdf370f69a0c963b25db9a883bb16a221845a3e0
For ISO:
rootflags=rw mount.usrflags=ro initrd=/flatcar/cpio.gz flatcar.autologin BOOT_IMAGE=/flatcar/vmlinuz
FYI - I encountered another issue with this release and image-builder after increasing the boot timeout. It seems like the boot command configured in image-builder is now failing because the first command is returning a warning that seems to be causing the c
of the curl
command to be lost.
I have reproduced this on 4054.0.0 with QEMU, so it hasn't been magically fixed since. I'll keep investigating.
Since the ISO configuration is identical to qemu
, it has the same OEM ID - we can't really set condition on this.
As it timeouts on getting the ROOT device (which does not exist on ISO):
Aug 09 14:25:14 localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dpartlabel-ROOT.device - /dev/disk/by-partlabel/ROOT.
Aug 09 14:25:14 localhost systemd[1]: dev-disk-by\x2dpartlabel-ROOT.device: Job dev-disk-by\x2dpartlabel-ROOT.device/start failed with result 'timeout'.
$ ls /dev/disk/by-label/
CDROM
I would add a condition to the unit (https://github.com/flatcar/bootengine/blob/flatcar-master/dracut/31decrypt-root/decrypt-root.service) to check for CDROM:
ConditionKernelCommandLine=!flatcar.oem.id=pxe
ConditionPathExists=!/dev/disk/by-label/CDROM
I'm starting to think that some Equinix Metal failures (PXE boot) might be explained by this timeout.
I've now set this up in a way that I can very quickly test changes. Dropping that unit entirely does indeed fix the problem.
Our other Dracut modules check for this situation with this. Perhaps it would be best to do something similar here?
usr=$(cmdline_arg mount.usr)
if [[ -z "${usr}" && -f /usr.squashfs ]]; then
This works:
ConditionKernelCommandLine=!mount.usr
ConditionPathExists=!/usr.squashfs
But only if I remove this line:
Wants=dev-disk-by\x2dpartlabel-ROOT.device
I think that device is dynamic, so I don't know whether removing that is a good idea.
Changing the Wants
to Requisite
is enough to make this work without even adding the other conditions. I just need to check that it still activates when actually needed.
Bleh, it doesn't seem to work. If I drop to a shell before it switches root, I find that it hasn't decrypted the root partition, but doing it manually works. It works automatically if I change it back to Wants
.
That does make sense. The by-partlabel symlink takes time to appear, so we want this to wait for that, but only if we're fairly sure it will appear eventually. I'm having a similar issue with the other boot stuff I'm looking at, so I will keep experimenting.
I've been doing some more thinking. I don't think we should assume that a user would not want to decrypt root when booting from the ISO or via PXE. Serving Flatcar via PXE but storing data on local encrypted disks is a key use case, right?
The point of this decrypt-root service is to make the decryption magically work without explicit configuration. Either we have to wait a reasonable amount of time for the root device to show up, or we have to make it work more reactively via a udev rule.
The question is whether this decrypted disk needs to be ready before some other boot time component. Do we do anything special with these "root" disks? What happens if it's not encrypted? If some other part of the boot process depends on this, doesn't that also need to wait for the symlink to appear?
There may be a usecase that uses an encrypted root disk with PXE, but then there would need to be a root=
flag on the kernel command line so that we know where the root (stateful) partition is located. So in that sense there are three cases:
a) no root=
argument and not (PXE or ISO): wait for PARTLABEL=ROOT
and try to decrypt
b) no root=
argument and (PXE or ISO): don't wait for disk
c) root=
argument and (PXE or ISO): wait for root=
disk and try to decrypt.
b) is what we want to fix here. a) works. c) is not supported (because of no root=
handling) but is also not terribly interesting at this time.
The question is whether this decrypted disk needs to be ready before some other boot time component. Do we do anything special with these "root" disks? What happens if it's not encrypted? If some other part of the boot process depends on this, doesn't that also need to wait for the symlink to appear?
Decrypt-root runs from the initrd, and we need to have the decrypted root mounted at /sysroot
before we can proceed with switch-root. So yes, for non-PXE, non-ISO boot we need to wait for PARTLABEL=ROOT
for sure.
There is the added fun that decrypt-root.service
only runs when ignition isn't provided. The first boot (ignition) has different services and ordering becacuse that's the boot that takes care of creating the encrypted root partition.
I wanted to allow for (c), and I had an idea, so I reworked this around a udev rule.
ACTION=="add", SUBSYSTEM=="block", ENV{ID_PART_ENTRY_NAME}=="ROOT", ENV{ID_FS_TYPE}=="crypto_LUKS", TAG+="systemd", ENV{SYSTEMD_WANTS}+="decrypt-root.service"
I also made decrypt-root.service a bit simpler. I haven't tried (a) yet, but (b) and (c) work really well. This also decrypts root even if you don't actually specify root=/dev/mapper/rootencrypted
. If you do specify it, then systemd knows to wait. I'll make a PR tomorrow.
how does the udev rule interact with ignition? for case a) and c) the encryption would be done by ignition so, decrypt-root.service would race with ignitions actions?
From: James Le Cuirot @.>
Sent: Monday, August 19, 2024 7:19:47 PM
To: flatcar/Flatcar @.>
Cc: Comment @.>; Subscribed @.>
Subject: Re: [flatcar/Flatcar] Boot time for ISO image 3975.2.0
release has increased to ~1m30s (Issue #1514)
I wanted to allow for (c), and I had an idea, so I reworked this around a udev rule.
ACTION=="add", SUBSYSTEM=="block", ENV{ID_PART_ENTRY_NAME}=="ROOT", ENV{ID_FS_TYPE}=="crypto_LUKS", TAG+="systemd", ENV{SYSTEMD_WANTS}+="decrypt-root.service"
I also made decrypt-root.service a bit simpler. I haven't tried (a) yet, but (b) and (c) work really well. This also decrypts root even if you don't actually specify root=/dev/mapper/rootencrypted. If you do specify it, then systemd knows to wait. I'll make a PR tomorrow.
— Reply to this email directly, view it on GitHubhttps://github.com/flatcar/Flatcar/issues/1514#issuecomment-2297060358 or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABXINVWHOK3TRWQN243CU3TZSISLHBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVE4DKNRYGIYTQMJSURXGC3LFVFUGC427NRQWEZLMVRZXKYTKMVRXIX3UPFYGLLCJONZXKZKDN5WW2ZLOOSTHI33QNFRXHE4CUR2HS4DFVJZGK4DPONUXI33SPGSXMYLMOVS2SMJSGM4TIOBRGQ2YFJDUPFYGLJLJONZXKZNFOZQWY5LFVIZDINJXGYYTMMBQGGBKI5DZOBS2K3DBMJSWZJLWMFWHKZNJHA2TMOBSGE4DCMVHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you commented on the thread.
Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Good news all round after more testing.
The rule works correctly on first boot and subsequently for (a) and (c) because of ENV{ID_FS_TYPE}=="crypto_LUKS"
. Even if ROOT already exists, the rule won't kick in until it's recreated as LUKS.
I also tried with flatcar.first_boot=1
. This initially failed, as it decrypted ROOT, which was then wiped by Ignition. Adding After=ignition-disks.service
fixed this, and now it wipes ROOT and decrypts it reliably every time.
In theory, this could be used to automatically decrypt any LUKS partitions but that's probably taking it too far.
Sorry, this was closed a little prematurely. I'll close it when the fix is in the stable release.
I've backported this to the 4012 Beta.
Description
The latest stable release,
3975.2.0
, is taking a considerably longer time to boot compared to previous releases.E.g.
vs. the previous release:
This has resulted in a delay at boot of roughly 1min 30sec compared to the previous release.
Impact
When building images with image-builder the default boot wait time is 60s after which Packer will start typing out the boot command. As this happens before the OS has finished booting the
flatcar-install
script isn't actually run correctly and can result in the OS not being correctly set up in an environment such as VSphere (it doesn't install the vmtools for example). With image-builder this then results in it hanging atWaiting for IP...
as vcenter is unable to retrieve the IP from the guest OS as no vmtools have been installed.Environment and steps to reproduce
3975.2.0
on VSphere viaimage-builder
make build-node-ova-vsphere-flatcar
b. The3975.2.0
image is used to launch a new VM (https://stable.release.flatcar-linux.net/amd64-usr/3975.2.0/flatcar_production_iso_image.iso
) c. The VM boots but vcenter fails to retrieve the IP address due to lack of vmtools on the guest OSWaiting for IP...
Expected behavior
The OS boots in under 60 seconds. Ideally back down below 10s like the previous release.
Additional information
This issue was previously about
open-vm-tools
not being installed but after debugging with @tormath1 on Slack (see thread) we learnt that it was actually due to the delay in boot that was causing the problem so I updated the issue to reflect that.