Closed henrywang closed 5 months ago
@bcrochet looks like the same thing you're hitting on c9s+x86+efi ?
@jeckersb Correct. I've seen this on baremetal and VMs on amd64. The key is that the systems are booting in UEFI mode and not in Legacy/BIOS mode.
So far I've tracked it down to bootupctl calling efibootmgr, and efibootmgr is failing.
While manually in a container with /target/boot/{,efi} mounted, as well as /boot/{,efi}, efibootmgr outputs this:
bash-5.2# efibootmgr
EFI variables are not supported on this system.
This will be fixed over in https://github.com/coreos/bootupd/pull/610 - will test this e2e when I get a minute.
I just did -v /sys/firmware/efi/efivars:/sys/firmware/efi/efivars
and it "worked". :)
root@ip-172-31-36-160 ~]# bootc status
note: The format of this API is not yet stable
apiVersion: org.containers.bootc/v1alpha1
kind: BootcHost
metadata:
name: host
spec:
image:
image: quay.io/bcrochet/centos-bootc-bcrochet:stream9-arm
transport: registry
status:
staged: null
booted:
image:
image:
image: quay.io/bcrochet/centos-bootc-bcrochet:stream9-arm
transport: registry
version: stream9.20240131.0
timestamp: null
imageDigest: sha256:ffec3a3705a03b9f31e782919fb7bc26a82e775e02f7c5f6b4861d617e4c47a7
incompatible: false
pinned: false
ostree:
checksum: c5d27c9d87a89df5bfe25437792749760fbf8a15d8c3fc2cac47296022612491
deploySerial: 0
rollback: null
isContainer: false
Hmm wait you did that without the above bootupd PR and it worked? That seems...weird.
Wait, I think the reason that works actually is because you're effectively bind mounting in an empty directory you're making bootupd think efi isn't enabled at all.
I have an EFI stream9 VM and that directory has a whole bunch of stuff:
[root@localhost ~]# ls /sys/firmware/efi/efivars/
Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c Key0001-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c Lang-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0002-8be4df61-93ca-11d2-aa0d-00e098032b8c LangCodes-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0003-8be4df61-93ca-11d2-aa0d-00e098032b8c MTC-eb704011-1402-11d3-8e77-00a0c969723b
BootCurrent-8be4df61-93ca-11d2-aa0d-00e098032b8c MokListRT-605dab50-e046-4300-abb6-3dd810dd8b23
BootOptionSupport-8be4df61-93ca-11d2-aa0d-00e098032b8c MokListXRT-605dab50-e046-4300-abb6-3dd810dd8b23
BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c OsIndicationsSupported-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConIn-8be4df61-93ca-11d2-aa0d-00e098032b8c PlatformLang-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConInDev-8be4df61-93ca-11d2-aa0d-00e098032b8c PlatformLangCodes-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConOut-8be4df61-93ca-11d2-aa0d-00e098032b8c PlatformRecovery0000-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConOutDev-8be4df61-93ca-11d2-aa0d-00e098032b8c Timeout-8be4df61-93ca-11d2-aa0d-00e098032b8c
ErrOut-8be4df61-93ca-11d2-aa0d-00e098032b8c VMMBootOrder0000-668f4529-63d0-4bb5-b65d-6fbb9d36a44a
ErrOutDev-8be4df61-93ca-11d2-aa0d-00e098032b8c VarErrorFlag-04b37fe8-f6ae-480b-bdd5-37d98c5e89aa
Key0000-8be4df61-93ca-11d2-aa0d-00e098032b8c
(no clue what any of that actually does, but it's there!)
It represents firmware level variables. On AWS there apparently are no persistent EFI variables, so in theory it should work to simply not set the BootCurrent
variable. But on physical systems it is commonly needed (or at least more optimal to do so, as the firmware may time out looking for the now nonexistent previous boot entry). We also need to do this on GCP at least which does have EFI variables.
Anyways I believe the bootupd change will fix this, but needs testing.
Hmm wait you did that without the above bootupd PR and it worked? That seems...weird.
Yes.
Wait, I think the reason that works actually is because you're effectively bind mounting in an empty directory you're making bootupd think efi isn't enabled at all.
I think it's actually the opposite. The directory is populated on the host via an efivarsfs
mount. That mount is not present in the container.
Basically, if that dir is not populated, bootupd assumes that EFI isn't enabled. Really it's efibootmgr, which I did verify gives a non-zero errorcode and the 'EFI variables are not supported' message, thus causing bootupd to fail.
From an AWS page:
UEFI variable persistence
For instances that were launched on or before May 10, 2022, UEFI variables are wiped on reboot or stop.
For instances that are launched on or after May 11, 2022, UEFI variables that are marked as non-volatile are persisted on reboot and stop/start.
Bare metal instances don't preserve UEFI non-volatile variables across instance stop/start operations.
It represents firmware level variables. On AWS there apparently are no persistent EFI variables, so in theory it should work to simply not set the
BootCurrent
variable.
FWIW, I've seen this situation on both baremetal and a VM. However, I'm having trouble replicating that situation. When I do, I'll update here.
Anyways I believe the bootupd change will fix this, but needs testing.
In the meantime, I will give a whirl to the bootupd fix.
I think it's actually the opposite. The directory is populated on the host via an efivarsfs mount. That mount is not present in the container.
Ahh OK sorry you may be right indeed! And the bootupd PR is probably unnecessary.
(I think what confused me just now is that I happened to be testing this out by launching a C9S machine, but our C9S images are only configured for legacy boot...)
So indeed, let's try changing things as you suggest to ensure that mount is present in the container.
It represents firmware level variables. On AWS there apparently are no persistent EFI variables, so in theory it should work to simply not set the
BootCurrent
variable. But on physical systems it is commonly needed (or at least more optimal to do so, as the firmware may time out looking for the now nonexistent previous boot entry). We also need to do this on GCP at least which does have EFI variables.Anyways I believe the bootupd change will fix this, but needs testing.
I can help testing if you have a PR or scratch build.
Hmm so in the GCP instance I'm testing, /sys/firmware/efi/vars
is not a distinct efivars
mountpoint, it's just sysfs. Will look at AWS in a second.
Ahhh now I understand, it's apparently just aarch64 systems that use efivars
on /sys/firmware/efi/efivars
. It's a bit surprising that podman run --privileged
doesn't recursively mount /sys
apparently.
But anyways we can clearly work around this by doing the mount internally.
@bcrochet I see two ways to go about this.
First, we can simply document doing -v /sys:/sys
for our install procedures; this gives us a proper full recursive mount. I lean a bit towards this path.
However, our podman run
invocation is already getting super unwieldy, and it's nicer if we just handle as much as we can internally.
So one approach here is something like this:
/proc/1/root/sys/fs/efi/efivars
is a mounted efivarfs
mount -t efivarfs efivarfs /sys/fs/efi/efivars
The 3rd option is to dynamically mutate our own mounts, but this requires the new kernel mount APIs that I don't think are in C9S for example yet.
@bcrochet I see two ways to go about this.
First, we can simply document doing
-v /sys:/sys
for our install procedures; this gives us a proper full recursive mount. I lean a bit towards this path.
Definitely a clean path, with just a doc change.
However, our
podman run
invocation is already getting super unwieldy, and it's nicer if we just handle as much as we can internally.So one approach here is something like this:
* Check if `/proc/1/root/sys/fs/efi/efivars` is a mounted `efivarfs` * If so, mount it on our own, via `mount -t efivarfs efivarfs /sys/fs/efi/efivars`
And I would lean here, as you said, because of the podman invocation getting longer and longer. And it becomes unclear as to why certain params are there and just become cargo-culted. And if any of those are no longer necessary, they are not likely to be removed from the hive conciousness.
The 3rd option is to dynamically mutate our own mounts, but this requires the new kernel mount APIs that I don't think are in C9S for example yet.
I'm not even going there. :)
Verified on bootc-0.1.7
on CS9.
I got the following error when I run
bootc install to-filesystem --replace=alongside
. I'm replace CS9 with CS9 container image. btw: ARM machine supports UEFI only.How to reproduce:
quay.io/centos-bootc/centos-bootc:stream9
withcloud-init
installed.podman run --rm --privileged --pid=host -v /:/target --security-opt label=type:unconfined_t quay.io/xiaofwan/centos-bootc-os_replace:c96p bootc install to-filesystem --replace=alongside /target