batocera-linux / batocera.linux

batocera.linux
https://batocera.org
Other
2.01k stars 515 forks source link

[v39 beta] Boot Option Restoration overwrites current boot config on Steam Deck #10493

Closed joerg-knitter closed 11 months ago

joerg-knitter commented 11 months ago

Batocera build version

v39 2023/11/25

Your architecture

Steam Deck (OLED)

Your Graphic Processor Unit (GPU)

Integrated (e.g. Intel HD 550; RX Vega 8)

Graphical brand

AMD

Issue description

I tried to install Batocera v38 on my new Steam Deck OLED on the SSD next to SteamOS and Windows. Therefore, I created two partitions, one FAT32 and one EXT4, copied the FAT32 files from the image to the FAT32 partition on the SSD and was able to start Batocera v38 successfully (apart from the rotated screen, missing drivers etc.; but it basically works). As bootmanager, I use Clover (https://github.com/ryanrudolfoba/SteamDeck-Clover-dualboot) so that I don´t have to boot from the corresponding efi files.

Since the Steam Deck OLED is not supported (yet), I decided to switch to the unstable branch and update to v39 via Batocera update menu (using an USB Ethernet adapter).

Unfortunately, since then, when I boot up Batocera, I see the blue "Boot Option Restoration" screen which boots up v39 fine but seem to overwrite the Clover bootloader. Booting up the Steam Deck the next time, I don´t have the Clover boot selection anymore, instead, Batocera is directly started again. (This can easily be fixed by selecting the efi file for Clover again because it has an auto-repair function (too?))

Expected result

Batocera (v39) boots up without overwriting the bootloader (settings).

Reproduction steps

On Steam Deck, but maybe also on PC:

Logs and data

No response

dmanlfc commented 11 months ago

We can't cater for various third-party bootloaders. We have to cater for our own methods that can be easily managed. v39 adds Secure Boot support, hence what you're seeing. You have a workaround after an initial install. Closing as won't fix.

joerg-knitter commented 11 months ago

It´s a little annoying that I have to reboot and select another EFI file first after having used Batocera to get the original boot menu back (avoid the auto booting of Batocera next time the device is being started up)...

n2qz commented 11 months ago

This should be a one-time thing, if you boot v39 via \EFI\BOOT\BOOTX64.EFI.

If you boot \EFI\batocera\grubx64.efi directly from Clover, it should bypass all of this (or \EFI\batocera\bootx64.efi if you want to retain secureboot shim loading).

Can you confirm if that resolves your concerns?

joerg-knitter commented 11 months ago

Yes, booting the \EFI\batocera\grubx64.efi does not overwrite the boot config, and Batocera also starts and works as expected from what I can see. Thanks a lot for this great hint!

Simply selecting the other efi file directly from Clover is not possible from what I can see from the docs at https://github.com/ryanrudolfoba/SteamDeck-Clover-dualboot. Also, the installed Clover-Toolbox script does not allow to change the default parameters like the used efi file. Here some more detailed steps for the followers of this issue about how I got it fixed:

Once, the config.plist got messed up after rebooting Batocera and got a size of 0 bytes making the bootloader really useless. So I had to re-download this config file and place it in the mentioned esp folder, but I was not able to reproduce this anymore - it might not be related to Batocera v39.

n2qz commented 11 months ago

I think I see what the install script is doing that's triggering the behavior you experienced.

I don't have a Steam Deck so I can't test the fix myself. Can you try the version of the script here, with the original config.plist file? https://github.com/n2qz/SteamDeck-Clover-dualboot/tree/bugfix/BootOrderBatocera

Direct download link for the modified script: https://raw.githubusercontent.com/n2qz/SteamDeck-Clover-dualboot/bugfix/BootOrderBatocera/install-Clover.sh

joerg-knitter commented 11 months ago

I am going to check the script later in practical use, but checking your changes, I don´t think that it will solve the problem because as soon as you boot Batocera with the bootx64.efi, it will try to fix the bootloader, so you will always enter this blue "Boot option restauration" screen first, followed by a reboot and then finally booting Batocera. Maybe it will even stay in a loop and never boot up Batocera because it always recognizes that the bootloader needs to be fixed... So booting in the grubx64.efi directly currently looks like the better solution at this moment. Like mentioned, TPM etc. is enabled in the BIOS, so I don´t see an issue here because it boots up properly.

n2qz commented 11 months ago

Please try it and let me know. My change to the installer script is intended to arrange things so that the fallback loader will see that the boot option is already present, and the anomalous behavior won't be triggered. The goal is to make it work seamlessly without the need to modify the configuration files by hand, using a setup script that will work with all Batocera versions. If the script works as I expect, I will submit it via pull request to the upstream SteamDeck-Clover-dualboot repository. Since I can't test it here without buying the same hardware, your assistance with testing it is vital.

joerg-knitter commented 11 months ago

Your fix unfortunately does not work. I completely uninstalled Clover and reinstalled it with your version. It acts like I thought before: It overwrites Clover and now only boots to Batocera (with no more Clover boot menu upfront). The only way to get out of this loop is to go to the BIOS settings and select the option to manually select the EFI file. When I select the SteamOS EFI file, Clover gets auto-fixed, and with the next reboot, Clover is being shown again. I made a capture of all this for a better unterstanding and uploaded it (temporarily) to https://www.youtube.com/watch?v=1P_92G3QFX4. Please ignore the rotated Batocera intro video, and it is not really realtime because I removed all the black frames ;)

Changing the EFI file in the config.plist like written above works like a charm. Is there any reason not to commit this instead?

n2qz commented 11 months ago

Your fix unfortunately does not work. I completely uninstalled Clover and reinstalled it with your version. It acts like I thought before: It overwrites Clover and now only boots to Batocera (with no more Clover boot menu upfront). The only way to get out of this loop is to go to the BIOS settings and select the option to manually select the EFI file. When I select the SteamOS EFI file, Clover gets auto-fixed, and with the next reboot, Clover is being shown again. I made a capture of all this for a better unterstanding and uploaded it (temporarily) to https://www.youtube.com/watch?v=1P_92G3QFX4. Please ignore the rotated Batocera intro video, and it is not really realtime because I removed all the black frames ;)

Start by following the procedure documented here, instead of letting it count down: https://wiki.batocera.org/secureboot#tpm

If that doesn't help by itself, please try again with my version of the clover script.

Finally, please change the title of your youtube video, or remove it. I'm already getting grief from another team member for helping you with this at all, since it isn't a Batocera issue. You're just using it ways that aren't supported and aren't in accordance with the correct setup procedure (wiki link above). I'm happy to continue to help with this as long as my assistance doesn't contribute to negative publicity for the project. I'm going out on a limb to help you out here, and I don't want to regret that decision.

Changing the EFI file in the config.plist like written above works like a charm. Is there any reason not to commit this instead?

It's isn't my call what gets committed in this steamdeck clover dual boot repository. I'm trying to help find an optimal solution that will allow that project to interoperate properly with Batocera, without breaking compatibility with v38 and earlier (and changing the config.plist would be a breaking change).

joerg-knitter commented 11 months ago

Thanks for your hint. I renamed the video, but it is and was not listed on YouTube and thus is not searchable, so I don't see the point of negative publicity. I am going to delete it anyway if your solution works - gonna check it later on when I have more time...

In contrast, I regard it as negative publicity if a normal user (maybe even not knowing how to enter the bios/boot option) gets his boot loader overwritten. It has been an eternal anger that on Windows installation, a linux boot loader gets overwritten. While Windows does this only once, Batocera does this on every start, and I don't understand why it is not allowed to use the grubx64.efi since TPM and secure boot are enabled and it boots up without problems. Do you have more explanatory links? If it is really necessary, an more extensive explanation on the boot recovery screen will be helpful/essential - I am willing to help if a help text is needed (and I have understood the issue).

I also don't think it is related to the Steam Deck only but also will happen on a "normal" PC since the Steam Deck is nothing else as a notebook with integrated gamepad instead of a keyboard.

BTW: I don't need to use Clover if the Batocera bootloader had a boot menu. Even tough the bootloader looks nice, the main goal for me is to have a comfortable OS selection when starting the device.

n2qz commented 11 months ago

Batocera doesn't overwrite the boot loader configuration on every start. The current beta v39 now uses a Microsoft-signed UEFI Secure Boot shim in order to enable booting on more systems, where it may be impossible or inconvenient to boot without Secure Boot. This shim -> MOKManager -> fallback scheme is developed and maintained by Red Hat, and is used by most major Linux distros.

Because Batocera is an open-source project, it's not practical for us to be a first-party player in this scheme getting our own shim signed directly by Microsoft, so we mostly use Ubuntu's signed-shim stack for this integration. Performing this integration is challenging, so we (I) have followed the lead of other projects already doing similar things, primarily Ventoy and rEFInd.

The documentation on all the shim-related screens is on the link I sent previously. If you'd like to improve it, our documentation is wiki-based in order to enable contributions from the community.

While the same boot flow does occur on "normal" PC's, it generally doesn't cause the issue that you're experiencing there. The whole purpose of the fallback setup is to enable the user to set up (at boot time) a flow where the fallback loader flow is replaced by a direct UEFI entry to boot Batocera, bypassing that flow. The script you are using removes that entry, causing a battle between Clover and Batocera for control of the boot process.

I'd love to help you get this resolved, but that's challenging if you don't try the specific diagnostic steps I recommend and provide proper feedback on the results. I do not have either the hardware or software that you're using.

Batocera already provides a boot menu on x86_64 platform, based on the syslinux bootloader. Work is underway to eventually replace that with Grub, hopefully within the next few releases. Work on that effort has already started, and I have a pending PR to use Grub for a specific UEFI boot flow that Batocera has never supported correctly in the past (loading a 64-bit OS from a 32-bit-only UEFI firmware BIOS).

Please understand that boot-loading (especially where Secure Boot is involved) is a challenging and specialized area of knowledge in OS development. Batocera currently uses and supports about half a dozen different boot-loading schemes across a wide variety of architectures and system platforms. Boot loading is not a primary focus of Batocera, and our goal is to get the system booted so that you can play games. Handling every single corner case created in every possible dual-boot scenario is not a design goal for the project.

The SecureBoot work you're criticizing is the result of over six months of research, planning, and testing by me personally. I tested prototypes of this setup on almost a dozen different x86_64 motherboards before the first pull requests for this work were submitted. A few issues turned up in early beta testing, and those issues are now mostly resolved (or at least have documented workarounds). Those resolutions came from the cooperation of our community, working through the issue with me to understand what what was happening and how to best resolve it while maintaining correct functionality.

Unfortunately, I can't make progress on your issue if you refuse to cooperate. Can you please confirm whether you have completed the documented TPM setup procedure in the wiki link I sent previously, and whether it had any effect on the unique challenges created when your clover installation script triggers this unusual behavior?

joerg-knitter commented 11 months ago

Thanks for your extensive explanation, and I have just recognized that the wiki page contains more infos - sorry for not having seen this before.

I am not uncooperative, I just did not have the time due to job and personal life. And I want to do extensive/careful testing before giving false alarms. Sorry for these delays...

In fact, I just found a free timeslot and unfortunately have to report again that the Clover bootmenu is not shown after having booted Batocera once, using your Clover modifications. However, after "recovering" Clover, Batocera does not show the boot restauration screen upon boot, but if you reboot, Batocera is the default boot manager again. At least, after "restauration" of Clover, there is no more boot restauration menu, but after reboot or shutdown + restart, the Clover menu is gone.

I am going to re-read your explanations and the Wiki and try to get a better understanding. As you mention rEFInd: Clover claims to be a fork of it if I recall correctly.

n2qz commented 11 months ago

I just pushed some additional fixes for the script. Please use the latest version when you re-test.

https://github.com/n2qz/SteamDeck-Clover-dualboot/tree/bugfix/BootOrderBatocera

https://raw.githubusercontent.com/n2qz/SteamDeck-Clover-dualboot/bugfix/BootOrderBatocera/install-Clover.sh

joerg-knitter commented 11 months ago

Did the re-test, unfortunately still no success.

n2qz commented 11 months ago

Did the re-test, unfortunately still no success.

Please provide output from the efibootmgr command after the test.

joerg-knitter commented 11 months ago

First Batocera boot from Clover boot menu: efibootmgr.txt

After reboot (where it starts Batocera directly instead of showing the Clover boot menu), same at 3rd boot etc.: efibootmgr_2.txt

And this is what I get from the Steam Deck Console (Desktop mode) after rebooting to SteamOS again with the appropriate efi file with one reboot in-between to ensure that the Clover menu is back again: efibootmgr_3.txt

n2qz commented 11 months ago

After the final reboot into the Clover menu, it looks good. If you reboot into Batocera at that point, does it stay the same or does it change back to prioritizing Batocera?

joerg-knitter commented 11 months ago

That's exactly the point: After booting Batocera once, it always changes back prioritizing Batocera, so after efibootmgr_3.txt, efibootmgr.txt would be the follow-up if I select Batocera from Clover.

It is as usability thing: After efibootmgr_2.txt, the "normal" user has to know that he has to press Power+"Volume+" and select the appropriate efi file. The same goes if Batocera is being booted from SD card, however here, removing the card is enough to get Clover back. But I wanted to have Batocera on SSD instead because I configured Batocera to take nearly all data from a NAS (shared content and config) and also keep the SD card free for other purposes (Steam game installations, data transfers etc.), while just taking 2x 8GByte on the SSD for being able to use Batocera.

n2qz commented 11 months ago

It's not clear to me from your response that you actually performed the requested test. Please boot Batocera after efibootmgr_3.txt and show me the efibootmgr output at that point.

joerg-knitter commented 11 months ago

The efibootmgr output would be efibootmgr.txt.

n2qz commented 11 months ago

I still can't tell if you ran the test or if you're just telling me what you think the result will be. I guess I'll have to give up on trying to make progress on this.

joerg-knitter commented 11 months ago

I ran the test.

  1. Booting Batocera from the Clover menu - efibootmgr.txt.
  2. Restarting Batocera - no Clover menu shown, instead direct boot into Batocera - efibootmgr_2.txt.
  3. Restarting Batocera again - still no Clover menu shown, instead direct boot into Batocera - efibootmgr_2.txt.
  4. Shut down Batocera, then pressing Power+"Volume+", select efi file from SteamOS folder - boots into SteamOS an fixes Clover.
  5. Restarting SteamOS, Clover is being shown again and allows selection, I select SteamOS - efibootmgr_3.txt.
  6. Restarting SteamOS, selecting Batocera from Clover menu = back to step 1.
n2qz commented 11 months ago

I'll make one last try to get the information I need: Please send the actual output of efibootmgr after running through the full test. I can't stress enough how important it is to provide hard data instead of verbal interpretation with this kind of remote debugging. There may (or may not) be something that will be significant in the output to me, that may not be obvious to you. Without seeing the actual data, I have no hope to proceed any further.

joerg-knitter commented 11 months ago

Your wish is my command, here is efiboomgr_4.txt after step 6: efibootmgr_4.txt

Like mentioned, it´s the same as efibootmgr.txt...

Here isna video (unlisted), showing the steps 1-6: https://youtu.be/nLl5NW3GvrQ. With the chapter marks, you can also see what step I am doing at a certain point of the video.

n2qz commented 11 months ago

Please do the following:

Thanks!

joerg-knitter commented 11 months ago

Here is the new video: https://youtu.be/BVZg0tT_Ktw

n2qz commented 11 months ago

Thanks, I've spent a few hours on this today and I'm still analyzing the situation. I have a few ideas but I'm not very sure of them. I may put together a version with some additional logging to gather more information, but I'm not sure yet if I'll be able to get to it today.

joerg-knitter commented 11 months ago

Take your time, I won't be able to test anyway due to job etc. in the next 20h... :(

n2qz commented 10 months ago

I spent some time on this again today, and learned a few things along the way. Now that I have a better understanding of the conditions required to trigger the behavior, I was able to reproduce it here (the location of the shim loader is significant as it behaves differently when loaded from the standard fallback filename). Here's what I know so far:

My recommendation is to file an issue report in the script's repo, referencing this discussion. I'd be happy to collaborate with them directly on resolving it, if needed.

Thanks for your help in collecting the information needed to understand the issue more deeply.

joerg-knitter commented 10 months ago

I have to thank you very much for your support and your time you already invested in this topic. In fact, I am going to file an issue report asap. I got a working solution by you by manually fixing the config.plist and also learned a lot (unfortunately still lacking a deeper secure boot understanding), but I am afraid, that on release of v39, there will be more requests (that´s why I continued discussing here...).

it's intentional that it's always bumping the CSV-configured loader to the highest priority, even if it previously existed somewhere else in the BootOrder On the one hand, this is understandable, on the other hand, Valve announced an upcoming official dual-boot option, so this topic might rise up again later. However, long time ago, Valve also announced that SteamOS will be available for Standard PCs, and as we know, we did not get it yet (officially) for quite a long time... ;)

A last question: As a simple fallback: Could an option/flag be introduced in ES or at least in a config file to turn it off - to make it work for any current or upcoming boot configuration and manager for the average user without deeper Linux knowledge?

n2qz commented 10 months ago

A last question: As a simple fallback: Could an option/flag be introduced in ES or at least in a config file to turn it off - to make it work for any current or upcoming boot configuration and manager for the average user without deeper Linux knowledge?

Unfortunately no, that's not a viable solution.