bedrocklinux / bedrocklinux-userland

This tracks development for the things such as scripts and (defaults for) config files for Bedrock Linux
https://bedrocklinux.org
GNU General Public License v2.0
603 stars 65 forks source link

potentially dangerous when hijacking opensuse #132

Open SchrodingerZhu opened 5 years ago

SchrodingerZhu commented 5 years ago

On my openSUSE Tumbleweed, it turns out to be a very bad choice to hijack the os. The grub2 bootloader will stop functioning will an update is required (e.g. kernel is updated).

Some basic info of my device:

I can manually boot the system by modifying something like /vmlinuz** to /boot/vmlinuz** at the booting time but when I check the grub.cfg, it looks fine when dealing with root and prefix.

Moreover, in rescue system we can only chroot into /bedrock/strata/openSUSE-tumbleweed. Therefore, re-running grub-install in rescue system is not possible because the root detecting will fail. I understand that this is expected. But what I am curious about is that why grub2 will boot at the correct root and configured with correct prefix while finding files in the wrong places.

SchrodingerZhu commented 5 years ago

I checked the grub file before hijacking.

...
menuentry 'openSUSE Tumbleweed'  --class opensuse --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-87995879-7413-4a25-\
8502-368927317d43' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_gpt
        insmod btrfs
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root  87995879-7413-4a25-8502-368927317d43
        else
          search --no-floppy --fs-uuid --set=root 87995879-7413-4a25-8502-368927317d43
        fi
        echo    'Loading Linux 5.2.3-1-default ...'
        linuxefi /boot/vmlinuz-5.2.3-1-default root=UUID=87995879-7413-4a25-8502-368927317d43  ${extra_cmdline} acpi_osi=! "acpi=windows 2009" nouve\
au.modeset=0 rd.driver.blacklist=nouveau splash=silent resume=/dev/disk/by-id/nvme-KXG50ZNV1T02_TOSHIBA_X7MS10PPTP8T-part4 quiet mitigations=auto
        echo    'Loading initial ramdisk ...'
        initrdefi /boot/initrd-5.2.3-1-default
}
...

I find that /boot is somehow in the path. However, after hijacking and a update, the path become messy.

paradigm commented 5 years ago

On my openSUSE Tumbleweed, it turns out to be a very bad choice to hijack the os. The grub2 bootloader will stop functioning will an update is required (e.g. kernel is updated).

That's disconcerting. I've removed OpenSUSE from the list of known-working for the time being.

It's not immediately clear to me why OpenSUSE would treat grub2 differently from the other distros we exercise more heavily here, but I'll dig into it as soon as time permits. Provided I can reproduce the issue, I'll try to provide a better work-around than the one you're currently utilizing as soon as I have it, and ensure the next Bedrock update fixes the problem properly.

I'll also look into making the website more clear about the scope of testing done with various distros. OpenSUSE does get some testing for each release, but it doesn't get exercised as much as other distros. I also don't think I've done any testing against specifically Tumbleweed.

Moreover, in rescue system we can only chroot into /bedrock/strata/openSUSE-tumbleweed. Therefore, re-running grub-install in rescue system is not possible because the root detecting will fail. I understand that this is expected.

We have some documentation here on fixing /boot by bind mounting it into /bedrock/strata/<stratum> before chrooting in. Reworking this part of the website to be easier to find is on the to-do list; you're not the first to miss it.

But what I am curious about is that why grub2 will boot at the correct root and configured with correct prefix while finding files in the wrong places.

I don't think I'm parsing this correctly. It sounds like you're saying that grub2 works despite being misconfigured, which appears to be in conflict with your prior statement that it stops functioning.


EDIT: I was able to hijack an OpenSUSE Tumbleweed install in a VM and run grub2-mkconfig which produced a config assuming the kernel and initrd were on the root of the filesystem rather than the actual /boot location. When time permits I'll see if I can trace the code grub2-mkconfig is running to see if I can understand why it is doing this and what changes are needed to resolve the situation.

paradigm commented 5 years ago

I managed to figure out:

I believe this is why a hijacked OpenSUSE Tumbleweed install's grub2-mkconfig generates a bad config.

Further more, I figured out:

Thus, I believe Bedrock's bind mount on /boot is confusing grub2-mkrelpath when used with -r.

Further more, I found:

Thus, I suspect it's a non-standard patch shared by a number of distros to add btrfs support to GRUB.

If anyone has time/interest/capability to trace further into grub2-mkrelpath to see what's going on and potentially upstream a fix to whomever is maintaining this -r flag patch, I think that's the best route to fix this properly.

In the mean time I'm going to get the hijack installer to detect this scenario and abort the install. Maybe something like:

if grub2-mkconfig --help 2>&1 | grep -q -- "-r"; then
    pre="$(grub2-mkrelpath -r /boot)"
    mount --bind /boot /boot
    post="$(grub2-mkrelpath -r /boot)"
    umount -l /boot
    if [ "${pre}" != "${post}" ]; then
        abort "Bedrock cannot support this system.  \`grub2-mkrelpath -r /boot\` appears to be confused by \`mount --bind /boot /boot\`; continuing may misconfigure GRUB on a kernel update.  This may be because of btrfs snapshot or subvolume usage.  Consider reinstalling without this feature in play on \`/boot\` or with another filesystem.  This is a known issue with Bedrock interaction with OpenSUSE."
    fi
fi

I plan to leave this issue open until I or someone else fixes the underlying issue. Once I've got the above described sanity check in place in an update I may rename the title of this issue to something more descriptive of the new problem state.

cptpcrd commented 5 years ago

Thus, I suspect it's a non-standard patch shared by a number of distros to add btrfs support to GRUB.

If anyone has time/interest/capability to trace further into grub2-mkrelpath to see what's going on and potentially upstream a fix to whomever is maintaining this -r flag patch, I think that's the best route to fix this properly.

I found a copy of the patch here (some of the variable/config option names used appear to indicate that it is maintained by SUSE, so this is as close as I could get to upstream). The relevant code appears to be the second set of changes to this file (this starts at line 109 of the patch):

Index: grub-2.02~rc1/grub-core/osdep/linux/getroot.c
===================================================================
--- grub-2.02~rc1.orig/grub-core/osdep/linux/getroot.c
+++ grub-2.02~rc1/grub-core/osdep/linux/getroot.c
@@ -376,6 +376,7 @@ get_btrfs_fs_prefix (const char *mount_p
   return NULL;
 }

+int use_relative_path_on_btrfs = 0;

 char **
 grub_find_root_devices_from_mountinfo (const char *dir, char **relroot)
@@ -519,6 +520,12 @@ again:
    {
      ret = grub_find_root_devices_from_btrfs (dir);
      fs_prefix = get_btrfs_fs_prefix (entries[i].enc_path);
+     if (use_relative_path_on_btrfs)
+       {
+         if (fs_prefix)
+           free (fs_prefix);
+         fs_prefix = xstrdup ("/");
+       }
    }
       else if (!retry && grub_strcmp (entries[i].fstype, "autofs") == 0)
    {

(use_relative_path_on_btrfs is declared as extern and set appropriately in grub-install.c and grub-mkrelpath.c.)

I'm not at all familiar with GRUB's source code, so I don't know exactly what this does. Just posting this in case anybody does and can figure out what's going on.