OSInside / kiwi

KIWI - Appliance Builder Next Generation
https://osinside.github.io/kiwi
GNU General Public License v3.0
283 stars 144 forks source link

Add support for grub2 with BLS #2497

Closed WenhuaChang closed 2 months ago

WenhuaChang commented 4 months ago

Kindly be aware that this issue has been created as a reminder. The work on the grub2 component is currently in progress and considers unfinished at this time.

The Boot Loader Specification (BLS) defines a standardized format for individual boot fragments. These fragments can be stored in a shared drop-in directory, enabling a pluggable configuration system for adding and removing entries.

One significant advantage is the improved cohesion between various boot-related tasks, as boot fragments are generated and maintained separately. This approach is architecture-agnostic, allowing seamless integration with different boot loaders.

In comparison to GRUB, which also supports loading boot fragments through the source command or by placing files in /etc/grub.d for mkconfig hooks, BLS has the potential to gain broader acceptance in the future. Particularly, developers, especially those associated with systemd, often prefer the BLS format for boot loaders.

Background

The BLS integration in GRUB has been developed by Fedora. For additional context, refer to the Fedora wiki page related to this topic.

It's essential to note that openSUSE do not utilize grubby as a unified manager for many loader configurations. Therefore, the transition from grubby to BLS may not be applicable. Instead, GRUB serves as the all-in-one loader that unifies configurations across the architectures supported by SUSE/openSUSE.

The primary motivation behind incorporating BLS into GRUB is the continual expansion of features in systemd that exclusively focus on producing BLS-compatible fragments. This change marks the initial step towards bridging the gap among different open-source prospects in the future.

Locating BLS

Before proceeding, review Fedora's page on Differences from BootloaderSpec. This outlines where BLS is searched for when the EFI System Partition is not mounted from a Linux system. Instead, the /boot partition, where grub.cfg is loaded (i.e., the GRUB $root device), is used for the BLS device.

To enhance collaboration around BLS tooling, we've introduced an additional patch in GRUB to enable BLS discovery over the EFI System Partition (/boot/efi) in addition to the Linux system boot (/boot) partition.

Getting Started

There are two ways to enable BLS testing for GRUB. One is to use grub2-switch-to_blscfg, a utility native to GRUB, to facilitate the transition. Another option is to try out the actual tool, kernel-install, which is a common method for enrolling kernels and initrd into the BLS structure.

Installing the Test Package

sudo zypper ar https://download.opensuse.org/repositories/home:/michael-chang:/efi:/grub:/blscfg/standard/home:michael-chang:efi:grub:blscfg.repo
sudo zypper ref
sudo zypper dup --allow-vendor-change --from home_michael-chang_efi_grub_blscfg

grub2-switch-to-blscfg

This script checks if BLS works for your GRUB installation with a single command:

sudo grub2-switch-to-blscfg

It creates /loader/entries under /boot, filling it with BLS fragments of discovered kernel images found in /boot. During this process, it enables BLS support in GRUB by setting GRUB_ENABLE_BLSCFG=true in /etc/default/grub and makes a backup copy of the original configuration to grub.cfg.backup.

After ensuring everything works, reboot to see if BLS entries appear in the GRUB menu:

sudo reboot

In case of failure after enabling BLS support, you can rescue the system at GRUB runtime by loading the backup:

configfile $prefix/grub.cfg.backup

kernel-install

If grub2-switch-to-blscfg works, you can proceed to test kernel-install, which is part of the udev package. It is primarily used to register the kernel and initrd for systemd-boot but can also be utilized for GRUB if properly integrated with BLS.

sudo bash -c 'echo layout=bls >>/etc/kernel/install.conf'
sudo kernel-install --verbose --make-entry-directory=yes --entry-token=machine-id add 6.8.0-rc4-2.g6b6d2be-default /boot/vmlinuz-6.8.0-rc3-1.gae4495f-default

You can clean up the /boot/loader blsdir created by grub2-switch-to-blscfg as it won't interfere. After all, kernel-install takes over the BLS update for new kernel packages:

sudo rm -rf /boot/loader

How it works

Please note that the following information reflects the current stance and may be subject to change in the future.

Vogtinator commented 4 months ago

FTR, BLS works mostly the same for grub2 + BLS support, systemd-boot and even zipl on s390x (no grub2-emu necessary!).

So ideally this is abstracted a bit into a bootloader-independent part which is responsible for creating the BLS configuration and a bootloader specific part which is responsible for creating the BLS partition and installing the bootloader if needed.

Conan-Kudo commented 4 months ago

Most of this work is already done. When we added BLS support for sd-boot, it was already structured in such a way that we can expose it as grub2-bls, zipl-bls (for #2481), and so on.

Conan-Kudo commented 4 months ago

Note that grubby allows users to centrally and trivially manage BLS configuration regardless of backend bootloader, so it is worth shipping in openSUSE too.

aplanas commented 4 months ago

Note that grubby allows users to centrally and trivially manage BLS configuration regardless of backend bootloader, so it is worth shipping in openSUSE too.

The mismatch is btrfs snapshots. MicroOS and TW manage snapshots differently. To manage this difference, and to synchronize the initrd, kernel and snapshot in a proper way we are using https://github.com/openSUSE/sdbootutil

Conan-Kudo commented 4 months ago

Note that grubby allows users to centrally and trivially manage BLS configuration regardless of backend bootloader, so it is worth shipping in openSUSE too.

The mismatch is btrfs snapshots. MicroOS and TW manage snapshots differently. To manage this difference, and to synchronize the initrd, kernel and snapshot in a proper way we are using https://github.com/openSUSE/sdbootutil

We should try to reconcile the approaches then. Because it would be beneficial to have a cross-distro way to do things.

Conan-Kudo commented 4 months ago

The current grubby is a shell script: https://src.fedoraproject.org/rpms/grubby/blob/rawhide/f/grubby-bls

aplanas commented 4 months ago

We should try to reconcile the approaches then. Because it would be beneficial to have a cross-distro way to do things.

Reconcile in what direction? If is the way snapshot management in MicroOS vs. TW then there is no much to reconcile: it is a different design with different tradeoff

If is about changing grubby to understand both distros, that can be a possibility

Conan-Kudo commented 4 months ago

Changing grubby to work with the snapshot management method and work on both Fedora and openSUSE would be ideal.

WenhuaChang commented 4 months ago

The sdbootutil is a kernel-install plugin so that grubby (and other boot loader tools) should use in order to take advantage of that.

aplanas commented 4 months ago

One note, BLS is enabled by kiwi by default (https://github.com/OSInside/kiwi/blob/main/kiwi/bootloader/config/grub2.py#L791, from https://github.com/OSInside/kiwi/pull/1252). Maybe this needs to be configured in via some XML field

Conan-Kudo commented 4 months ago

Why? It does nothing on systems that don't have grub2-bls support, and kiwi does very little when making disk images, preferring to have the system do most of the work. ISOs are already special and that flag doesn't mean anything because we generate a grub cfg.

aplanas commented 4 months ago

Why? It does nothing on systems that don't have grub2-bls support

That is the issue: there is a grub2 package in openSUSE with the BLS. This gets detected and the image generated by kiwi for openSUSE does not generate the entries, so the menu from the boot is empty.

IMO the use or not of BLS entries is not dependent of grub supporting BLS or not. Kiwi today detect that BLS is supported, and unconditionally adds GRUB_ENABLE_BLSCFG=true in /etc/default/grub and this should be under user discretion (unless openSUSE only support BLS, that is good for me)

Conan-Kudo commented 4 months ago

IMO the use or not of BLS entries is not dependent of grub supporting BLS or not. Kiwi today detect that BLS is supported, and unconditionally adds GRUB_ENABLE_BLSCFG=true in /etc/default/grub and this should be under user discretion (unless openSUSE only support BLS, that is good for me)

Or rather, the tooling should default to BLS once this lands in Factory. It looks like the missing piece is that kernel-install isn't configured properly. When grub2-bls lands, this should be flipped. Then everything will just work.

Vogtinator commented 4 months ago

IMO the use or not of BLS entries is not dependent of grub supporting BLS or not. Kiwi today detect that BLS is supported, and unconditionally adds GRUB_ENABLE_BLSCFG=true in /etc/default/grub and this should be under user discretion (unless openSUSE only support BLS, that is good for me)

Or rather, the tooling should default to BLS once this lands in Factory. It looks like the missing piece is that kernel-install isn't configured properly. When grub2-bls lands, this should be flipped. Then everything will just work.

It won't.

GRUB supporting BLS is just one of many steps. For now we can't use it.

aplanas commented 4 months ago

It looks like the missing piece is that kernel-install isn't configured properly.

Uhmm it is not as simple as that. Every update (independently if there is a kernel update or not) requires new boot entries that will relate this snapshot with the different initrd, kernels and cmdline valid for the new snapshot. This is done by sdbootutil and requires some changes to support grub (wip from my side)

There is a sdbootutil-kernel-install subpackage that calls sdbootutil when a new kernel is added or removed, to is can do the right thing, but AFAIC it is not used as is snapper the one that calls sdbootutil in any case (adding or removing the kernel also as a side effect)

So until all this pieces are together, and openSUSE decides to go to BLS only, the current check that add unconditionally `GRUB_ENABLE_BLSCFG=true' is wrong.

Conan-Kudo commented 4 months ago

And what are we supposed to do in the event it is set? We're not magically adding packages for openSUSE, the image would still be broken if the user doesn't install everything needed.

Vogtinator commented 4 months ago

I wonder why kiwi touches GRUB_ENABLE_BLSCFG at all, shouldn't it just use the distro's default?

In other words, is https://github.com/OSInside/kiwi/commit/9d8ce7eedbd2c866a09334853948fadc793d2653 actually necessary?

aplanas commented 4 months ago

And what are we supposed to do in the event it is set? We're not magically adding packages for openSUSE, the image would still be broken if the user doesn't install everything needed.

Exactly. That is why https://github.com/OSInside/kiwi/pull/1252 is wrong, that unconditionally add the parameter if it is supported by grub. That is why I suggested to create be a new parameter in the XML profile, so the user can define to use BLS or not, depending on the selected packages.

Conan-Kudo commented 4 months ago

Because GRUB has no defaults in any distro. The /etc/default/grub file is generated by whatever installs the system. Otherwise it's completely empty and GRUB is broken.

aplanas commented 4 months ago

I am a bit confused by this comment, but there are some factual errors:

Because GRUB has no defaults in any distro

openSUSE provides a default one via the package:

> rpm -qf /etc/default/grub
grub2-2.12-11.1.x86_64

The /etc/default/grub file is generated by whatever installs the system

As shown no, openSUSE provides one. The install system updates this one (like Kiwi is doing) adding the themes (totally optional) and the cmdline

Otherwise it's completely empty and GRUB is broken.

The unmodified one is this:

https://build.opensuse.org/package/view_file/Base:System/grub2/grub.default?expand=1

It is not empty. IIUC with updating GRUB_CMDLINE_LINUX_DEFAULT from this configuration (also done by Kiwi and YaST) is the only required step to have a working system.

Again, adding the GRUB_ENABLE_BLSCFG=true unconditionally is what is breaking the image, but I do not see the relation with this last comment.

schaefi commented 4 months ago

I somehow agree that the way kiwi handles GRUB_ENABLE_BLSCFG is not ideal. So far all distributions for which we could find a handling of GRUB_ENABLE_BLSCFG in the respective mkconfig code, are exclusively using grub in the BLS style. That's why it was sort of ok to set the variable based on that simple search. For openSUSE it seems BLS is added as a feature that can be enabled. If enabled it seems that also other configurations of the system are affected:

ok, for implementing grub BLS in kiwi we will implement the BLS interface class that exists for this purpose for grub, like we already have done it for systemd-boot and zipl. This allows us to differentiate between "old" grub and BLS grub. In the "old" grub I suggest to delete the setting of GRUB_ENABLE_BLSCFG completely, it's wrong there. And for the BLS grub users need to specify that loader via e.g <bootloader name="grub_bls" .../> or similar.

This brings us then into the state where we can properly implement BLS grub setup.

Well and now this BLS grub setup would be ideally generic to all distributions.

Exactly at that point my pain starts because I think between Fedora and openSUSE as we are discussing are already some differences. Ideally those are implemented in the distributions such that calling the grub tools in a generic way leads to the desired results. iirc this is what distributors does anyway because grub-mkconfig as an example is pretty different between many distributions. Which is not an issue as long as an appliance builder can call them and as long as there are no options implemented like --suse-xxx which I have also seen in the past. Something like that is a killer for us and it would be great if it can be avoided.

This would be currently my plan forward on this issue, but I'm not clear about the use of the tooling

Thanks

WenhuaChang commented 4 months ago
  • grub2-switch-to_blscfg, the tool from grub to convert grub.cfg into a main config and the loader entries. From the conversation so far I'm not clear if openSUSE wants to utilize it ? I guess not

It is only for testing purpose of people want to have a bit of taste how it works. Grub never to be the role of a BLS provider/generator, otherwise it gets in the way again.

aplanas commented 4 months ago

We are using sdbootutil as a BLS entries generator for grub. You can find an image here: https://build.opensuse.org/package/show/home:aplanas:branches:devel:microos:images/openSUSE-MicroOS

aplanas commented 4 months ago

And for the BLS grub users need to specify that loader via e.g <bootloader name="grub_bls" .../> or similar.

I suggest something more in line with: <bootloader name="grub2" console="gfxterm" entry="bls" />. Ideally grub-bls is not a different bootloader, is just grub with some patches. If all goes well, some day those patches will reach upstream making 'name="grub_bls"' kind of legacy.

Conan-Kudo commented 3 months ago

Who is driving the BLS patchset upstream?

aplanas commented 3 months ago

The default enablement of BLS is stopping the grub2-bls change in Factory. @schaefi would be OK if this feature is enabled only in case that it is detected and <bootloader name="grub2" entry="bls" /> is set?

Another name for the field can be bls="yes" or something like that. If this approach is OK to unlock factory I can work after suse labs on this.

schaefi commented 3 months ago

The default enablement of BLS is stopping the grub2-bls change in Factory. @schaefi would be OK if this feature is enabled only in case that it is detected and <bootloader name="grub2" entry="bls" /> is set?

Another name for the field can be bls="yes" or something like that. If this approach is OK to unlock factory I can work after suse labs on this.

I'm fine with the suggested approach. I like bls as the attribute name more than entry So if we can go for

<bootloader name="grub2" console="gfxterm" bls="true"/>

That would be great. bls is then a boolean attribute like we have it in many other places. I guess the default will be "false" to stay compatible with current descriptions.

schaefi commented 3 months ago

@aplanas Thanks much for your offer to help, much appreciated

aplanas commented 3 months ago

Right, bls="true" seems the most reasonable field name.