OSInside / kiwi

KIWI - Appliance Builder Next Generation
https://osinside.github.io/kiwi
GNU General Public License v3.0
300 stars 152 forks source link

Only enable BLSCFG if the new 'bls' attribute is set #2535

Closed aplanas closed 5 months ago

aplanas commented 5 months ago

Partially fixes #2497

It is not properly tested, as I did not generate images with grub2-bls (not sure how to do that outside OBS), but I added some tests to cover this feature.

Conan-Kudo commented 5 months ago

Aside from the unit test, this is going to break existing descriptions in the wild for Fedora and CentOS, especially the ones now used for Fedora 40 (the first release that uses kiwi to build its cloud images) since it turns off bls by default.

I would prefer that we don't do that and instead preserve the old behavior if it's unset.

aplanas commented 5 months ago

The old behavior (activate bls unconditionally) is the bug. Now adding a grub2 version that adds a feature (BLS) breaks all the openSUSE images and is blocking it into factory.

Fedora can fix this with one single change: adding bls="true", openSUSE cannot fix it this without forcing the move to Factory.

Sadly I do not see a workaround here.

aplanas commented 5 months ago

Sadly I do not see a workaround here.

I want to extend and share this argument in case I am missing a better idea:

bug: If BLS feature is detected in GRUB2, KIWI enable this unconditionally

Evaluating the cost of Fix 4 is something that the SUSE / openSUSE GRUB2 maintainers needs to calculate, but seems illogical to create a subpackage for a feature that is correctly implemented in GRUB2 (opt in) because is incorrectly implemented in KIWI (opt out)

cc: @WenhuaChang

Conan-Kudo commented 5 months ago

Note that as the code works right now, bls support has to be detected for the flag to mean anything, so it breaks nobody who doesn't have the feature.

Conan-Kudo commented 5 months ago

Fedora switched to BLS by default in Fedora 28 and the distro integration tooling generally doesn't work in non-BLS mode anymore. So a non-BLS grub is not going to happen in Fedora basically ever anymore.

aplanas commented 5 months ago

@Conan-Kudo I do not think that you are understanding what it is happening:

We want the blscfg command from GRUB2, but not for all the images: we want to opt in for now for a subset of images.

Do you see the issue? (Please, do not bring grubby in this thread)

Conan-Kudo commented 5 months ago

I understand the issue just fine. This has nothing to do with grubby and everything to do with you not wanting to make a full cut-over. And the reason you don't want to has to do with the custom tooling you have in place on top of the bootloader which is not adapted fully over to BLS.

But here's the reality from the Fedora side who also uses kiwi in production on stable images:

Changing the default behavior for openSUSE to break Fedora is not acceptable. The other way around has not been generally accepted without heads-up but has been less of a problem because openSUSE Tumbleweed is a rolling release platform with no expectations for stability.

You could have also considered subpackaging the blscfg GRUB module in your grub2 package, but you threw that choice out and didn't present it as an option for consideration.

Conan-Kudo commented 5 months ago

Also, as a general rule, I'm inclined to reject this if you can't even test your changes on your own images. You don't know how to build images outside of OBS? That's critically important because OBS changes the way that kiwi is used and invokes slightly different behaviors.

aplanas commented 5 months ago

I understand the issue just fine. This has nothing to do with grubby and everything to do with you not wanting to make a full cut-over. And the reason you don't want to has to do with the custom tooling you have in place on top of the bootloader which is not adapted fully over to BLS.

You cannot force openSUSE (not me) to use unconditionally BLS. Use blscfg is an option, not a mandatory feature.

But here's the reality from the Fedora side who also uses kiwi in production on stable images:

* BLS is used unconditionally everywhere

* Non-BLS is effectively not supported

You can use bls="true" just fine.

* A change like this effectively breaks the ability to build Fedora 40 images because the bls option didn't exist before.

This fix a bug introduced in KIWI. Enabling a feature unconditionally without the option to opt out is wrong.

Changing the default behavior for openSUSE to break Fedora is not acceptable.

You missed the point spectacularly. We are not changing openSUSE, we are fixing a bug in KIWI that makes a GRUB2 feature to be forced unconditionally.

The other way around has not been generally accepted without heads-up but has been less of a problem because openSUSE Tumbleweed is a rolling release platform with no expectations for stability.

What?

You could have also considered subpackaging the blscfg GRUB module in your grub2 package, but you threw that choice out and didn't present it as an option for consideration.

Did you really read what others write?

aplanas commented 5 months ago

Also, as a general rule, I'm inclined to reject this if you can't even test your changes on your own images

Why do you think that I cannot test it? I assure you that I can test it.

You don't know how to build images outside of OBS?

....

That's critically important because OBS changes the way that kiwi is used and invokes slightly different behaviors.

....

aplanas commented 5 months ago

Is there a chance to refocus this in a constructive way?

Besides the options from https://github.com/OSInside/kiwi/pull/2535#issuecomment-2079474621 I guess that there is another one:

aplanas commented 5 months ago

(cont.)

I think that is very unfair to put more work on the GRUB2 maintainers (fixes 3 and 4), so this can be an alternative too:

Conan-Kudo commented 5 months ago

And it's fair to make us deal with it? How does that make sense?! This is only happening because you don't want to cut over everything to BLS right now. Otherwise you wouldn't care.

Anyway, screw it. Clearly my needs as the community maintainer, developer, and contributor don't matter.

Let's merge this and then I can suffer for this downstream. But note that we are actively removing SUSE-specific stuff from kiwi upstream too.

aplanas commented 5 months ago

And it's fair to make us deal with it?

No. I am proposing alternatives. I want to work in the most cost effective solution.

How does that make sense?!

It does not. I am trying to work here.

This is only happening because you don't want to cut over everything to BLS right now.

Forcing openSUSE to move NOW to BLS is something that you perceive as fair?

Otherwise you wouldn't care.

I am working on exploring more the alternatives, is this a sign that I do not care?

Anyway, screw it. Clearly my needs as the community maintainer, developer, and contributor don't matter.

....

Let's merge this and then I can suffer for this downstream. But note that we are actively removing SUSE-specific stuff from kiwi upstream too.

Can we please work on this before merging it? I do not think that this is time critical, can we give us one week to explore better the space problem?

aplanas commented 5 months ago

Oh my ...

Why has this been merged?

@Conan-Kudo can we please work in a good way? I do really want a good solution. Can we please do the work, instead merging this without a proper review?

aplanas commented 5 months ago

@Conan-Kudo can you revert this and give us another opportunity to work together?

I do think that this is what is hurting the project and the community. Instead of working we are wasting time in discussions and it is not helping in exploring the solution space. A PR is an opportunity to review the code (in this case I had none), the alternatives and the cost of them.

Can we please do the job?

Conan-Kudo commented 5 months ago

It's been merged because the code is actually fine, the implications are what suck. But I am the only kiwi maintainer not employed by SUSE, and if I blocked this, it would continue to become uncomfortable for everyone. This is not the first time we've had this problem related to grub stuff, and I expect it won't be the last.

I can suck it up and deal with it downstream in Fedora, Fedora Asahi, CentOS Stream, and CentOS Hyperscale somehow. it's going to be a bit of a mess because I have no synchronization point for Fedora.

At the end of the day, the real problem is that despite repeated attempts to encourage collaboration both upstream and downstream on grub2 work, nobody wants to. So we continue having this mess and kiwi is caught in the middle of it.

It has been 10 years, and none of the SUSE-specific Btrfs features for grub have been submitted upstream. To be fair, the same is also true for blscfg, but at least that's just a module instead of changes to multiple things in GRUB.

I'm tired of this. I'm tired of nobody wanting to work together on anything for the boot stuff. I'm tired being caught in the middle over it too. It's because of this that GRUB development upstream slowed to a crawl. Nobody who does this work for distributions wants to do anything upstream.

Conan-Kudo commented 5 months ago

I will also add a final point: I asked multiple times about rolling out BLS support in openSUSE's GRUB when we first added it to kiwi years ago with RHEL 8 support. Nobody responded to my messages at all. I wasn't even the only person asking for it.

Nobody commented then, nobody cared. And here we are again because nobody pays attention. If nothing changes, this will keep happening.

aplanas commented 5 months ago

It's been merged because the code is actually fine, the implications are what suck.

Before rushing the merge, can we work on alternatives with cheaper implications? The cost of this one is to add a field in the XML in the KIWI files controlled by Fedora, but I understand that there are more like that in the wild.

But I am the only kiwi maintainer not employed by SUSE, and if I blocked this, it would continue to become uncomfortable for everyone.

There is not reason to be uncomfortable. This PR is me working with upstream (you). This change is not for SUSE but for openSUSE (Factory).

I can suck it up and deal with it downstream in Fedora, Fedora Asahi, CentOS Stream, and CentOS Hyperscale somehow. it's going to be a bit of a mess because I have no synchronization point for Fedora.

You mean adding a patch in those packages that revert this change? Can this be, somehow, be alleviated?

If your decision is to add this patch that revert this change, I propose to add in those projects the "bls=true" in the XML that will imply that the patch in the package will be just one line: the and in the condition, keeping the change in the XML parse and the rest. This will give time to this change spread.

At the end of the day, the real problem is that despite repeated attempts to encourage collaboration both upstream and downstream on grub2 work, nobody wants to.

But I am willing! Why are you assuming otherwise? In what moment did I give you this impression? I am not forcing this PR, and I was opening other alternatives. When did I pushed the message that I do not want to collaborate?

It has been 10 years, and none of the SUSE-specific Btrfs features for grub have been submitted upstream. To be fair, the same is also true for blscfg, but at least that's just a module instead of changes to multiple things in GRUB.

I am not a grub2 maintainer but I talked about this issue long enough (https://news.opensuse.org/2023/12/20/systemd-fde/). This is a big issue, and upstream GRUB2 and the openSUSE maintainer are aware of it and they are making big efforts to fix it (check the FOSDEM talk to see the upstream position, and I guarantee you that the downstream maintainer are not happy to have more that 200 patches to take care of)

I'm tired of this. I'm tired of nobody wanting to work together on anything for the boot stuff. I'm tired being caught in the middle over it too. It's because of this that GRUB development upstream slowed to a crawl. Nobody who does this work for distributions wants to do anything upstream.

I do not have a solution neither ... but I feel you

aplanas commented 5 months ago

I will also add a final point: I asked multiple times about rolling out BLS support in openSUSE's GRUB when we first added it to kiwi years ago with RHEL 8 support. Nobody responded to my messages at all. I wasn't even the only person asking for it.

But what do you think that we are doing??!!

We are doing steps: we have a design that requires systemd-boot for FDE, we relaxed this requirement to just have BLS, we moved the BLS patches to the openSUSE GRUB2, and we changed KIWI to have this patches in Factory. Don't you see the direction??

Conan-Kudo commented 5 months ago

It's been merged because the code is actually fine, the implications are what suck.

Before rushing the merge, can we work on alternatives with cheaper implications? The cost of this one is to add a field in the XML in the KIWI files controlled by Fedora, but I understand that there are more like that in the wild.

But I am the only kiwi maintainer not employed by SUSE, and if I blocked this, it would continue to become uncomfortable for everyone.

There is not reason to be uncomfortable. This PR is me working with upstream (you). This change is not for SUSE but for openSUSE (Factory).

I can suck it up and deal with it downstream in Fedora, Fedora Asahi, CentOS Stream, and CentOS Hyperscale somehow. it's going to be a bit of a mess because I have no synchronization point for Fedora.

You mean adding a patch in those packages that revert this change? Can this be, somehow, be alleviated?

If your decision is to add this patch that revert this change, I propose to add in those projects the "bls=true" in the XML that will imply that the patch in the package will be just one line: the and in the condition, keeping the change in the XML parse and the rest. This will give time to this change spread.

Realistically, there are only two workable options for compatibility now that we've locked in kiwi 10's behaviors:

At the end of the day, the real problem is that despite repeated attempts to encourage collaboration both upstream and downstream on grub2 work, nobody wants to.

But I am willing! Why are you assuming otherwise? In what moment did I give you this impression? I am not forcing this PR, and I was opening other alternatives. When did I pushed the message that I do not want to collaborate?

It has been 10 years, and none of the SUSE-specific Btrfs features for grub have been submitted upstream. To be fair, the same is also true for blscfg, but at least that's just a module instead of changes to multiple things in GRUB.

I am not a grub2 maintainer but I talked about this issue long enough (https://news.opensuse.org/2023/12/20/systemd-fde/). This is a big issue, and upstream GRUB2 and the openSUSE maintainer are aware of it and they are making big efforts to fix it (check the FOSDEM talk to see the upstream position, and I guarantee you that the downstream maintainer are not happy to have more that 200 patches to take care of)

But I asked about this in 2019. And back then the GRUB maintainers in openSUSE ignored my requests.

I know nobody is happy about 200+ patches, but it doesn't seem to motivate anyone to do anything about it.

I'm tired of this. I'm tired of nobody wanting to work together on anything for the boot stuff. I'm tired being caught in the middle over it too. It's because of this that GRUB development upstream slowed to a crawl. Nobody who does this work for distributions wants to do anything upstream.

I do not have a solution neither ... but I feel you

If only I could get everyone into a room...

aplanas commented 5 months ago
* Unset `bls` goes to old behavior (effectively default to `bls="true"` because there's lots of private kiwi descriptions that will break otherwise), we would switch it for SUSE distributions only to default to false until grub2 with bls is propagated

Yes, that was my proposal too.

* Total revert and we add a `grub2-no-bls` bootloader choice that removes the bls setting

That is a bad idea as will make valid XMLs invalid (there is an option that is not recognized).

I propose this patch for Fedora, this will keep the old behavior but will accept the new parameter:

diff --git a/kiwi/bootloader/config/grub2.py b/kiwi/bootloader/config/grub2.py
index dcd03534f..101b28a96 100644
--- a/kiwi/bootloader/config/grub2.py
+++ b/kiwi/bootloader/config/grub2.py
@@ -795,7 +795,7 @@ class BootLoaderConfigGrub2(BootLoaderConfigBase):
                 self._get_grub2_mkconfig_tool()
             ], raise_on_error=False
         )
-        if self.bls and enable_blscfg_implemented.returncode == 0:
+        if enable_blscfg_implemented.returncode == 0:
             grub_default_entries['GRUB_ENABLE_BLSCFG'] = 'true'

         if grub_default_entries:
diff --git a/test/unit/bootloader/config/grub2_test.py b/test/unit/bootloader/config/grub2_test.py
index d81d76ec7..e9189a609 100644
--- a/test/unit/bootloader/config/grub2_test.py
+++ b/test/unit/bootloader/config/grub2_test.py
@@ -600,6 +600,7 @@ class TestBootLoaderConfigGrub2:
             'GRUB_BACKGROUND': '/boot/grub2/themes/openSUSE/background.png',
             'GRUB_CMDLINE_LINUX_DEFAULT': '"some-cmdline"',
             'GRUB_DISTRIBUTOR': '"Bob"',
+            'GRUB_ENABLE_BLSCFG': 'true',
             'GRUB_ENABLE_CRYPTODISK': 'y',
             'GRUB_GFXMODE': '800x600',
             'GRUB_SERIAL_COMMAND': '"serial --speed=38400"',
@@ -642,6 +643,7 @@ class TestBootLoaderConfigGrub2:
             call('GRUB_CMDLINE_LINUX', '"root=LABEL=some-label"'),
             call('GRUB_DISABLE_LINUX_UUID', 'true'),
             call('GRUB_DISTRIBUTOR', '"Bob"'),
+            call('GRUB_ENABLE_BLSCFG', 'true'),
             call('GRUB_ENABLE_CRYPTODISK', 'y'),
             call('GRUB_ENABLE_LINUX_LABEL', 'true'),
             call('GRUB_GFXMODE', '800x600'),
@@ -686,6 +688,7 @@ class TestBootLoaderConfigGrub2:
             call('GRUB_DISABLE_LINUX_PARTUUID', 'false'),
             call('GRUB_DISABLE_LINUX_UUID', 'true'),
             call('GRUB_DISTRIBUTOR', '"Bob"'),
+            call('GRUB_ENABLE_BLSCFG', 'true'),
             call('GRUB_ENABLE_CRYPTODISK', 'y'),
             call('GRUB_GFXMODE', '800x600'),
             call(
@@ -729,6 +732,7 @@ class TestBootLoaderConfigGrub2:
             call('GRUB_CMDLINE_LINUX_DEFAULT', '"abcd console=tty0"'),
             call('GRUB_DISABLE_LINUX_UUID', 'true'),
             call('GRUB_DISTRIBUTOR', '"Bob"'),
+            call('GRUB_ENABLE_BLSCFG', 'true'),
             call('GRUB_ENABLE_CRYPTODISK', 'y'),
             call('GRUB_ENABLE_LINUX_LABEL', 'true'),
             call('GRUB_GFXMODE', '800x600'),
WenhuaChang commented 5 months ago

At the end of the day, the real problem is that despite repeated attempts to encourage collaboration both upstream and downstream on grub2 work, nobody wants to. So we continue having this mess and kiwi is caught in the middle of it.

It has been 10 years, and none of the SUSE-specific Btrfs features for grub have been submitted upstream. To be fair, the same is also true for blscfg, but at least that's just a module instead of changes to multiple things in GRUB.

I'm tired of this. I'm tired of nobody wanting to work together on anything for the boot stuff. I'm tired being caught in the middle over it too. It's because of this that GRUB development upstream slowed to a crawl. Nobody who does this work for distributions wants to do anything upstream.

I am sorry to make you feel so. But we did before submitting a series of patches, we initially sent a foundational patch to the upstream, but it did not proceed as expected. You can view this initial communication here:

https://mail.gnu.org/archive/html/grub-devel/2020-07/msg00000.html

To clarify the issue with the upstream patch: the upstream btrfs file system uses a clear, unambiguous path system that originates from the top-level tree. This approach is preferred because it allows files to be consistently located and shared with other grub utilities without the need for specific modifications.

However, for the btrfs snapshots, particularly in systems like SUSE/openSUSE, it is necessary to use paths relative to the subvolume. This requirement arises because it is not possible to modify the grub.cfg file within a read-only (ro) snapshot to update hardcoded absolute paths to reflect their new locations within a snapshot.

Meanwhile the downside of using relative paths is evident. It introduces the need for additional constructs during grub's runtime, such as "get-default", "set-subvolume" and "mount-subvolume". These additions complicate automatic setup tools like grub-install, grub-mkconfig and grub-mkrelpath, which now require more sophisticated handling, therefore increasing the overall complexity — particularly in environments that do not use similar btrfs boot configuration introducing the change is not welcome.

To be honest, I have revisited this issue regularly but have not yet been able to devise a satisfactory solution.

WenhuaChang commented 5 months ago

Why can't we leave GRUB_ENABLE_BLSCFG unset and allow it to follow the distribution's default settings? For example, Fedora enables BLS by default if grubby is not used, while openSUSE does not enable BLS at all.

https://github.com/rhboot/grub2/blob/fedora-39/util/grub.d/10_linux.in#L203

Conan-Kudo commented 5 months ago

Because in practice, grubby has to be installed on Fedora systems or the kernel package installs things to the wrong place. That's why our kiwi descriptions ensure it: https://pagure.io/fedora-kiwi-descriptions/c/31f9f8a7437ffe6cb5a9a761dfac62949f8b07ff