containers / bootc

Boot and upgrade via container images
https://containers.github.io/bootc/
Apache License 2.0
510 stars 66 forks source link

Run bootc install failed in AWS EC2 ARM instance with error "Failed to invoke efibootmgr" #291

Closed henrywang closed 5 months ago

henrywang commented 5 months ago

I got the following error when I run bootc install to-filesystem --replace=alongside. I'm replace CS9 with CS9 container image. btw: ARM machine supports UEFI only.

error: boot data installation failed: installing component EFI: Updating EFI firmware variables: Clearing current EFI boot entry: Failed to invoke efibootmgr
    [31mERROR[0m Installing bootloader: Task Running bootupctl to install bootloader failed: ExitStatus(unix_wait_status(256))

How to reproduce:

  1. Build a container image based on quay.io/centos-bootc/centos-bootc:stream9 with cloud-init installed.
  2. Deploy a AWS EC2 ARM instance with CS9 installed.
  3. Install podman and skopeo
  4. Run podman run --rm --privileged --pid=host -v /:/target --security-opt label=type:unconfined_t quay.io/xiaofwan/centos-bootc-os_replace:c96p bootc install to-filesystem --replace=alongside /target
    podman run --rm --privileged --pid=host -v /:/target --security-opt label=type:unconfined_t quay.io/xiaofwan/centos-bootc-os_replace:c96p bootc install to-filesystem --replace=alongside /target
    Trying to pull quay.io/xiaofwan/centos-bootc-os_replace:c96p...
    Getting image source signatures
    Copying blob sha256:2ad0a0fc42196a143f32c3f12c2c6069f4d17bfe15da6cd9a8d6f218b3070ca3
    Copying blob sha256:fafcf770d62c778be6cc40ac2f0ff857e80aedd300c80de76813cc2349e5851c
    Copying blob sha256:4a5f4a6603cd1623dc680d66502b3d96e70b36ae8dabea53b8532f0d8bfa965a
    Copying blob sha256:2e4960a02a6368e170f228b4fa106e3c710985860fb5c1801686ae96c94e2e38
    Copying blob sha256:8165ff2287c15fcad90b123c2a97c9869c022b43b722a6b39c3bdcbc806d49d1
    Copying blob sha256:1be6be164bbd6705807fb07de5310646da50330bb137e6b3a5f231a33312af58
    Copying blob sha256:cc8749a253d9477f9303b71fc21c0289ae290ffe7dc718271627bf14005fc9b4
    Copying blob sha256:1ee5d7f2e08e16cff6825e6ea85b18d1ff751a7e602c73e61a2feddf4709efea
    Copying blob sha256:5071d287f7c7db296d5c28f34097e0647f7e77e57c084a24427b8b67bf9268b2
    Copying blob sha256:771e574d54474b447c6d422376e1700f4c89d83988e934a9fbd07728223a8a5d
    Copying blob sha256:6c2e22d7b9b19b57645b401b2561e54d8ce8ef9872828a701095d2eed278428c
    Copying blob sha256:5956b8f11f80d013a46bac783e9d1b57b20226fa071d416428b43b5696375c91
    Copying blob sha256:5b75cba2a0bca6859e5bb59d92ace364b1a6868749b5e525b259e88e12d85e10
    Copying blob sha256:d71aab47db7a5596be1e320bab2c08fba924ad635f31bbaa1002344114992039
    Copying blob sha256:f7011a5f5a1cde8b8987c92987cabe95e84ba8b29179b471be001c2810411765
    Copying blob sha256:4726293aa2a33ff85b98c4a71bd4fb21e6e3df24812cb409c412192c5a939115
    Copying blob sha256:ff42e03d47a841053f7a186bd824718d42ff1d406ae47e07c37e752cdf563c14
    Copying blob sha256:aa1a93e2aa70569018566df2945c27bb8ac6f0e266460c9576c362763ce9980e
    Copying blob sha256:f5d0c029d5fa4603d870b801e8641bab8922e581aebdb15733d1c442277cde9e
    Copying blob sha256:0614e40d506f2160fbb8c4904f512953b6fe8cc7f1f2c099eb7ef04b49d370f4
    Copying blob sha256:21b080004e82d1fcf3047e8151994cead0c4f3a1532c3deff7e9bfa7aa7af663
    Copying blob sha256:83abaf6d6857c6205a75e1ef1674015f5da94c88d535421752e761457c30e9f5
    Copying blob sha256:ac8e626392a1f9dc56c2619d13fe20dbeb7d35264f0f79def88b504a2c645733
    Copying blob sha256:a3a568395ae2c83d8c26a745345ba996214cf347b502dd487afa5daa4d7a34ad
    Copying blob sha256:379fef1872ce19fffc8a6f0d54cba46618ee7232270c5669869ed6de71aca569
    Copying blob sha256:2d19a128033e0a53ea4fd9306b3d3a4d8008effbe6906bea0ede2dbada2ffb5c
    Copying blob sha256:99e0a5d400870763e44fbc09991caa64fb573071b2fce8c36a3eb2448b7bc08e
    Copying blob sha256:0270fde0d4d373bfc00391c5eab11c93114016eef4aa74bb13e798fd74963457
    Copying blob sha256:ddb083605a303814a02bff4f93286e7ee2c8436959f5c9c7a629feb500d014dc
    Copying blob sha256:27bcd423590746df7d4c62994fb8b7bd57a9ec66d590f0890c08498c0d1c145b
    Copying blob sha256:9128594d5c6ead9ad197f976698827477d618a94a96cacfbcbc915f70d1e1407
    Copying blob sha256:3ee5b3a43c74526384cf82c102646bf59c97d4526ddcac452d8fdb0ee33945fb
    Copying blob sha256:6ecf34d6acab0e18b92ba33e99787cf95db08e0bfca15856242c26de47e588a4
    Copying blob sha256:cc08b56138c7f253f58e79d63b2f70b2eeb57be4c8b6ac93cf2cc765859872a0
    Copying blob sha256:ecec55d49774543868959ac7b3dcf2cc23f4f67baa7a9053587155f9865978db
    Copying blob sha256:2310bd66d182ce8cb8d550419364f5967aa0d95f7d81c1e11b15ae08467be7c6
    Copying blob sha256:38206305e735ab0de33d2281dd1e1e22b100e7292494dd47a72ee0cfe139f74d
    Copying blob sha256:2031882dacbcd48cffea51d62f8f8a426380a49430ec0b382c0d560dbb95ad5f
    Copying blob sha256:58f1b769e6dae40157549544aa8db989166eb59a92454abb1f46a62f4eddf4b1
    Copying blob sha256:b9a88c272e5bf2e9d057be600c0544733d5dbf116c975de4d96767fab8cb482d
    Copying blob sha256:3431012f8246bc23f888bc8bcf8a8b9e5b4de052e72e8cc95395b6697316dee1
    Copying blob sha256:5407712299a1bbd339edd45097a52c6d25846032ff1c2e6dbdbdf72f48ab975c
    Copying blob sha256:3fced47699a75e9cab1b45db3732f92f052f1337b26ffbf687e05e4e14523cc8
    Copying blob sha256:75ee130be62b299a0c3a3df33fb0d8e0df8323521c00601c7eb4bfdcf4a56562
    Copying blob sha256:bb28cead0c1cf940dd124072098f23398d7b75850609bf99805fb8e75fa54c72
    Copying blob sha256:a612057a628545173593e463aa3d052544d272b5189e30a999e8fd7c93641ed4
    Copying blob sha256:0f87ae811fded979967e93bb19cd98b9756e8e4df94b71a3603089c0ca178df9
    Copying blob sha256:8edb4ff9b6fd7a207a011d650410ad5e58bb2cd69e4a64b9c39ea902a688ad73
    Copying blob sha256:e33a25e042c926ccbe3ff1ad92b31cb1153fc351d8ff6424657be8207ea43653
    Copying blob sha256:525b12ec7344bee23baab3c2fd1de76ddea466fdbfd9c3a2eb01d3389c5cfe75
    Copying blob sha256:2c26116509e233504c4d86af1639ff43fb3ce68a1893884efe338bdb69209c21
    Copying blob sha256:c686bafea26cb0f72db29f817c9196d5e9a91fffa8e0c1168b0ef21ac96b3470
    Copying blob sha256:724789c163736aa86fd0243c0252b5dce1eb1b7386d8fe7228b0090aea5a36ef
    Copying blob sha256:856147274a0c1c07edc2178334263f9d39f1a8d969f43e63829903e924b692ec
    Copying blob sha256:cb397995265fe0962f057b3290efb551755477bca620be9272b2dae4fb179209
    Copying blob sha256:100193d149de721aa41f27e3632aceb9d4952d818c97469ecce9de40ef6e3f64
    Copying blob sha256:0f2b29cda75b43b647ea3ee5aefc234b1372ea68703ea58e28b3293c4668fa25
    Copying blob sha256:a6793a91d7ab33b41726f46f7ec2885191ec90202f93c4b1034ab07b8a41bb36
    Copying blob sha256:6d651fe8b357b307bc2dea52617db1231d9debd060fb4f4da16c4adfb4b7086e
    Copying blob sha256:a089c9b0be5c078190fb6435a9afa292fe1fb45d731a05bc156847890d0229e6
    Copying blob sha256:2187fdc81842dc9f1009de093df1f19923fb775820bad5203d444d2b1a0e9bfb
    Copying blob sha256:b3438f7d09b3fb457af20c680b0f7d2026920c99640879c6b0c59e6330fe460f
    Copying blob sha256:7c454d4c6e6bff807c24271f78477b226561ab694a4e8be6976faef68938b85d
    Copying blob sha256:29989df1360d4ebcdec8e3986b8157b7db5ef23fee2d3529022d8700b7049990
    Copying blob sha256:ad312c5c40ccd18a3c639cc139211f3c4284e568b69b2e748027cee057986fe0
    Copying blob sha256:bd9ddc54bea929a22b334e73e026d4136e5b73f5cc29942896c72e4ece69b13d
    Copying blob sha256:64d72b57554b16466fdd445667a09991e786eb8528c9c419b63dca83cd5ddbc1
    Copying blob sha256:79bf18632c97f347e6506a213a3807a27fe7a424a62c346b45f391646789b54a
    Copying config sha256:b5d59b7bafe76a7d317369c2ceeac8a1caa373bb57e9ad1b9b62182d99a4d568
    Writing manifest to image destination
    error: boot data installation failed: installing component EFI: Updating EFI firmware variables: Clearing current EFI boot entry: Failed to invoke efibootmgr
    [31mERROR[0m Installing bootloader: Task Running bootupctl to install bootloader failed: ExitStatus(unix_wait_status(256))
    Mounting selinuxfs
    Initializing ostree layout
    Initializing sysroot
    ostree/deploy/default initialized as OSTree stateroot
    Creating initial deployment
    Installed: ostree-unverified-registry:quay.io/xiaofwan/centos-bootc-os_replace:c96p
       Digest: sha256:b0d389e9e26d9378ccdbbccc13c5632fd636fe660c00b1d4a61f259e8408af3b
    Running bootupctl to install bootloader
jeckersb commented 5 months ago

@bcrochet looks like the same thing you're hitting on c9s+x86+efi ?

bcrochet commented 5 months ago

@jeckersb Correct. I've seen this on baremetal and VMs on amd64. The key is that the systems are booting in UEFI mode and not in Legacy/BIOS mode.

bcrochet commented 5 months ago

So far I've tracked it down to bootupctl calling efibootmgr, and efibootmgr is failing.

While manually in a container with /target/boot/{,efi} mounted, as well as /boot/{,efi}, efibootmgr outputs this:

bash-5.2# efibootmgr
EFI variables are not supported on this system.
cgwalters commented 5 months ago

This will be fixed over in https://github.com/coreos/bootupd/pull/610 - will test this e2e when I get a minute.

bcrochet commented 5 months ago

I just did -v /sys/firmware/efi/efivars:/sys/firmware/efi/efivars and it "worked". :)

root@ip-172-31-36-160 ~]# bootc status
note: The format of this API is not yet stable
apiVersion: org.containers.bootc/v1alpha1
kind: BootcHost
metadata:
  name: host
spec:
  image:
    image: quay.io/bcrochet/centos-bootc-bcrochet:stream9-arm
    transport: registry
status:
  staged: null
  booted:
    image:
      image:
        image: quay.io/bcrochet/centos-bootc-bcrochet:stream9-arm
        transport: registry
      version: stream9.20240131.0
      timestamp: null
      imageDigest: sha256:ffec3a3705a03b9f31e782919fb7bc26a82e775e02f7c5f6b4861d617e4c47a7
    incompatible: false
    pinned: false
    ostree:
      checksum: c5d27c9d87a89df5bfe25437792749760fbf8a15d8c3fc2cac47296022612491
      deploySerial: 0
  rollback: null
  isContainer: false
cgwalters commented 5 months ago

Hmm wait you did that without the above bootupd PR and it worked? That seems...weird.

Wait, I think the reason that works actually is because you're effectively bind mounting in an empty directory you're making bootupd think efi isn't enabled at all.

jeckersb commented 5 months ago

I have an EFI stream9 VM and that directory has a whole bunch of stuff:

[root@localhost ~]# ls /sys/firmware/efi/efivars/
Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c           Key0001-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c           Lang-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0002-8be4df61-93ca-11d2-aa0d-00e098032b8c           LangCodes-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0003-8be4df61-93ca-11d2-aa0d-00e098032b8c           MTC-eb704011-1402-11d3-8e77-00a0c969723b
BootCurrent-8be4df61-93ca-11d2-aa0d-00e098032b8c        MokListRT-605dab50-e046-4300-abb6-3dd810dd8b23
BootOptionSupport-8be4df61-93ca-11d2-aa0d-00e098032b8c  MokListXRT-605dab50-e046-4300-abb6-3dd810dd8b23
BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c          OsIndicationsSupported-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConIn-8be4df61-93ca-11d2-aa0d-00e098032b8c              PlatformLang-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConInDev-8be4df61-93ca-11d2-aa0d-00e098032b8c           PlatformLangCodes-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConOut-8be4df61-93ca-11d2-aa0d-00e098032b8c             PlatformRecovery0000-8be4df61-93ca-11d2-aa0d-00e098032b8c
ConOutDev-8be4df61-93ca-11d2-aa0d-00e098032b8c          Timeout-8be4df61-93ca-11d2-aa0d-00e098032b8c
ErrOut-8be4df61-93ca-11d2-aa0d-00e098032b8c             VMMBootOrder0000-668f4529-63d0-4bb5-b65d-6fbb9d36a44a
ErrOutDev-8be4df61-93ca-11d2-aa0d-00e098032b8c          VarErrorFlag-04b37fe8-f6ae-480b-bdd5-37d98c5e89aa
Key0000-8be4df61-93ca-11d2-aa0d-00e098032b8c

(no clue what any of that actually does, but it's there!)

cgwalters commented 5 months ago

It represents firmware level variables. On AWS there apparently are no persistent EFI variables, so in theory it should work to simply not set the BootCurrent variable. But on physical systems it is commonly needed (or at least more optimal to do so, as the firmware may time out looking for the now nonexistent previous boot entry). We also need to do this on GCP at least which does have EFI variables.

Anyways I believe the bootupd change will fix this, but needs testing.

bcrochet commented 5 months ago

Hmm wait you did that without the above bootupd PR and it worked? That seems...weird.

Yes.

Wait, I think the reason that works actually is because you're effectively bind mounting in an empty directory you're making bootupd think efi isn't enabled at all.

I think it's actually the opposite. The directory is populated on the host via an efivarsfs mount. That mount is not present in the container.

Basically, if that dir is not populated, bootupd assumes that EFI isn't enabled. Really it's efibootmgr, which I did verify gives a non-zero errorcode and the 'EFI variables are not supported' message, thus causing bootupd to fail.

From an AWS page:

UEFI variable persistence

For instances that were launched on or before May 10, 2022, UEFI variables are wiped on reboot or stop.

For instances that are launched on or after May 11, 2022, UEFI variables that are marked as non-volatile are persisted on reboot and stop/start.

Bare metal instances don't preserve UEFI non-volatile variables across instance stop/start operations.

It represents firmware level variables. On AWS there apparently are no persistent EFI variables, so in theory it should work to simply not set the BootCurrent variable.

FWIW, I've seen this situation on both baremetal and a VM. However, I'm having trouble replicating that situation. When I do, I'll update here.

Anyways I believe the bootupd change will fix this, but needs testing.

In the meantime, I will give a whirl to the bootupd fix.

cgwalters commented 5 months ago

I think it's actually the opposite. The directory is populated on the host via an efivarsfs mount. That mount is not present in the container.

Ahh OK sorry you may be right indeed! And the bootupd PR is probably unnecessary.

(I think what confused me just now is that I happened to be testing this out by launching a C9S machine, but our C9S images are only configured for legacy boot...)

So indeed, let's try changing things as you suggest to ensure that mount is present in the container.

henrywang commented 5 months ago

It represents firmware level variables. On AWS there apparently are no persistent EFI variables, so in theory it should work to simply not set the BootCurrent variable. But on physical systems it is commonly needed (or at least more optimal to do so, as the firmware may time out looking for the now nonexistent previous boot entry). We also need to do this on GCP at least which does have EFI variables.

Anyways I believe the bootupd change will fix this, but needs testing.

I can help testing if you have a PR or scratch build.

cgwalters commented 5 months ago

Hmm so in the GCP instance I'm testing, /sys/firmware/efi/vars is not a distinct efivars mountpoint, it's just sysfs. Will look at AWS in a second.

cgwalters commented 5 months ago

Ahhh now I understand, it's apparently just aarch64 systems that use efivars on /sys/firmware/efi/efivars. It's a bit surprising that podman run --privileged doesn't recursively mount /sys apparently.

But anyways we can clearly work around this by doing the mount internally.

cgwalters commented 5 months ago

@bcrochet I see two ways to go about this.

First, we can simply document doing -v /sys:/sys for our install procedures; this gives us a proper full recursive mount. I lean a bit towards this path.

However, our podman run invocation is already getting super unwieldy, and it's nicer if we just handle as much as we can internally.

So one approach here is something like this:

The 3rd option is to dynamically mutate our own mounts, but this requires the new kernel mount APIs that I don't think are in C9S for example yet.

bcrochet commented 5 months ago

@bcrochet I see two ways to go about this.

First, we can simply document doing -v /sys:/sys for our install procedures; this gives us a proper full recursive mount. I lean a bit towards this path.

Definitely a clean path, with just a doc change.

However, our podman run invocation is already getting super unwieldy, and it's nicer if we just handle as much as we can internally.

So one approach here is something like this:

* Check if `/proc/1/root/sys/fs/efi/efivars` is a mounted `efivarfs`

* If so, mount it on our own, via `mount -t efivarfs efivarfs /sys/fs/efi/efivars`

And I would lean here, as you said, because of the podman invocation getting longer and longer. And it becomes unclear as to why certain params are there and just become cargo-culted. And if any of those are no longer necessary, they are not likely to be removed from the hive conciousness.

The 3rd option is to dynamically mutate our own mounts, but this requires the new kernel mount APIs that I don't think are in C9S for example yet.

I'm not even going there. :)

henrywang commented 4 months ago

Verified on bootc-0.1.7 on CS9.