bedrocklinux / bedrocklinux-userland

This tracks development for the things such as scripts and (defaults for) config files for Bedrock Linux
https://bedrocklinux.org
GNU General Public License v2.0
603 stars 64 forks source link

update-grub breaks system after bedrock hijack #157

Open shadowrylander opened 4 years ago

shadowrylander commented 4 years ago

Hello! It would appear as though update-grub breaks the system after bedrock is installed and /etc/default/grub is edited (changed GRUB_SAVEDEFAULT to false); I tested the command before and after the install, and afterwards it returns the error:

mount: /new_root: wrong fs type, bad option, bad superblock on /dev/nvme0n1p5, missing codepage or helper program, or other error.
You are now being dropped into an emergency shell.
sh: can't access tty: job control turned off

I'm not sure whether I'm making a mistake in editing the grub file, or if btrfs somehow can't find the necessary files created by the update, but any help would be greatly appreciated!

paradigm commented 4 years ago

Hello!

Hi!

It would appear as though update-grub breaks the system after bedrock is installed and /etc/default/grub is edited (changed GRUB_SAVEDEFAULT to false);

This isn't enough information for me to reproduce the issue and debug it locally. I've edited /etc/default/grub and run update-grub plenty under Bedrock systems without issue. Can provide step-by-step instructions which reproduce it consistently?

It would appear as though update-grub breaks the system after bedrock is installed and /etc/default/grub is edited (changed GRUB_SAVEDEFAULT to false); I tested the command before and after the install, and afterwards it returns the error:

mount: /new_root: wrong fs type, bad option, bad superblock on /dev/nvme0n1p5, missing codepage or helper program, or other error.
You are now being dropped into an emergency shell.
sh: can't access tty: job control turned off

Without context, I would interpret your second it here refers to the command which refers to update-grub. That is, that update-grub is producing the quoted error message when run. However, its contents of the error message seem more like something which would come from the initrd at boot time. I'm continuing below assuming its from the initrd; if I'm mistaken and you did indeed mean it came from update-grub do indicate so.

I'm not sure whether I'm making a mistake in editing the grub file, or if btrfs somehow can't find the necessary files created by the update, but any help would be greatly appreciated!

OpenSUSE patched grub2-mkrelpath to special case btrfs handling in a way which works out-of-the-box on OpenSUSE but breaks the GRUB updating mechanism when run on Bedrock, very similar to what you're describing. Bedrock's hijack code checks for this scenario and refuses to continue if the scenario is detected. If the issue here is specific to Bedrock not a mistake editing the grub file, it seems very likely to be related; maybe there are more btrfs/grub bugs out there that Bedrock is failing to check for.

A couple things to try:

shadowrylander commented 4 years ago

Hi! 😸 So basically, on Manjaro:

  1. Install Manjaro 1.1 My partitions are follows:
    • /dev/nvme0n1p1: Microsoft Recovery Partition
    • /dev/nvme0n1p2: EFI Boot
    • /dev/nvme0n1p3: Microsoft Partition
    • /dev/nvme0n1p4: Windows 10
    • /dev/nvme0n1p5: BTRFS Root
    • /dev/nvme0n1p6: BTRFS Home
    • /dev/nvme0n1p7: BTRFS Data Partition
    • /dev/nvme0n1p8: BTRFS Data Partition
    • /dev/nvme0n1p9: BTRFS Data Partition
  2. Reboot into Manjaro on internal SSD
  3. Install bedrock:
    cd ~/Downloads
    sudo sh bedrock-[...].sh --hijack
  4. Reboot
  5. Edit /etc/default/grub 5.1 Change GRUB_SAVEDEFAULT from true to false and save
  6. Run update-grub
  7. Reboot

Instead of booting into Manjaro, I get:

mount: /new_root: wrong fs type, bad option, bad superblock on /dev/nvme0n1p5, missing codepage or helper program, or other error.
You are now being dropped into an emergency shell.
sh: can't access tty: job control turned off

Sorry for the confusion! I will attempt to provide my /boot/grub/grub.cfg, though I've reinstalled so many Linux distros over the past few days I'm starting to worry about the health of my internal SSD! 😹 I'm assuming I should provide you with the copy after the error, correct? Because that means I will have to trigger the problem, and therefore reinstall Manjaro, again! However, if so, I will attempt your fix in editing the grub prompt config!

Thank you kindly for the help once again!

paradigm commented 4 years ago

When I find the time I'll attempt to reproduce the issue as described. May be a bit.

My request for your grub.cfg was assuming you still had it. If you've since wiped the system and lost it, and are now concerned around disk churn to reproduce it again yourself, we can wait until I've attempted to reproduce it by following your instructions. If I can reproduce it there's no need for you to do so again.

shadowrylander commented 4 years ago

Makes sense! Thanks for understanding! And quick question: I'm assuming that, while you can install kernels using bedrock, you can't use them, right? Because since I'm using a Surface Book 2, I tried using custom debian kernels with Manjaro, but I couldn't access them.

A simple yes or no would be fine to prevent overtaking this thread, if you can answer the question here! Thanks again!

paradigm commented 4 years ago

Bedrock lets you install and use kernels from different distros. Assuming you're on GRUB, the catch is that only your GRUB stratum (typically but not necessarily the hijacked one) has package manager hooks to update the GRUB config upon installing/updating a kernel. If you install a kernel from another stratum, you have to manually prompt GRUB to update its config.

It's possible to switch which stratum provides (and maintains/updates) the bootloader by installing a bootloader in another stratum over the existing one. However, it does introduce risk of something going wrong and should be done with care. Bedrock only has one, shared /boot; do not tell a package manager to uninstall a bootloader you've overwritten as it will likely "uninstall" the new one. You can get rid of the old GRUB provider by tossing the entire stratum via brl remove, however.

I typically use Debian's kernel, as I like the lack of update-churn. If its working, I can be reasonably confident it'll continue working. However, if/when I find I want some new kernel feature (e.g. to support newly acquired hardware) I just install Arch's kernel until Debian's has caught up to my desired feature set. Works fine.

shadowrylander commented 4 years ago

As soon as I get back home, I will go over what you just said! 😹 Thanks for the info; should be useful, if the supposed update-grub problem can be fixed! 😸

paradigm commented 4 years ago

Happy to help :)

shadowrylander commented 4 years ago

And the info makes sense! I think... Is there anywhere I can find further documentation on this? I.E. different distro kernels on bedrock?

paradigm commented 4 years ago

Ideally everything should be on bedrocklinux.org. Sadly, however, the content on there lags a bit from what I'd like, as resources towards supporting current users have been eating into the documentation budget. I'm hoping to pause feature development after my current task and do a big doc rework.

That having been said, on specifically different kernels on Bedrock, there's only one more think I can think to say:

While in general the Linux kernel is very good about being forwards and backwards compatible with userland - and thus across strata - occasionally userland software may depend on features of a specific kernel build. If some program is failing to work and you're at a complete loss for why, one of the things to test would be if installing and rebooting into its stratum's kernel fixes it. This is most prevalent with new kernel features, which means this concern is lessened if you're running a more recent kernel. However, in theory it could also happen with distro-specific kernel patches. I am at a loss for any current examples of this, though.

shadowrylander commented 4 years ago

That also makes sense; unfortunately, due to using a Surface Book 2, I'm stuck with a specific series of kernels which would make debugging difficult, as switching kernels would interrupt other important features of the machine, such as suspending, the touchscreen, cooling, etc. ¯_(ツ)_/¯ 😭

shadowrylander commented 4 years ago

I assume the no eta rule applies to this? 😅I.e. the grub fix?

paradigm commented 4 years ago

I am able to reproduce the issue. I believe I have a high level of understanding of what's going on. However, I do not have a plan for a fix at this time.

A fresh btrfs Manjaro install's grub.cfg contains

rootflags=subvol=@

which works fine. However, after a Bedrock hijack, when updated it then contains:

rootflags=subvol=@/bedrock/strata/manjarolinux

This is problematic. This change is due to

case x"$GRUB_FS" in
    xbtrfs)
    rootsubvol="`make_system_path_relative_to_its_root /`"
    rootsubvol="${rootsubvol#/}"
    if [ "x${rootsubvol}" != x ]; then
        GRUB_CMDLINE_LINUX="rootflags=subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX}"
    fi;;
    xzfs)
    rpool=`${grub_probe} --device ${GRUB_DEVICE} --target=fs_label 2>/dev/null || true`
    bootfs="`make_system_path_relative_to_its_root / | sed -e "s,@$,,"`"
    LINUX_ROOT_DEVICE="ZFS=${rpool}${bootfs%/}"
    ;;
esac

at /etc/grub.d/10_linux. make_system_path_relative_to_its_root calls grub-mkrelpath. grub-mkrelpath --help indicates

Transform a system filename into GRUB one.

Some Bedrock-specific background to help understand what's happening here: Bedrock functions largely by controlling what files different processes see when they do file related operations. It does so via various Linux virtual filesystem layer abstraction tools. Sometimes they see different files, sometimes they see the same file. For example, if you have both Arch and Manjaro strata, it is important that each sees different files at /etc/pacman.d/mirrorlist, as the two distros use different mirrors. In Bedrock's terminology, files like this are called local files. This is in contrast to global files like /home where every process should see the same thing. A process from one stratum may access another stratum's local files by prefixing /bedrock/strata/<stratum> to the file path. For example, a Manjaro program make be instructed to manipulate Arch's mirrorlist via /bedrock/strata/arch/etc/pacman.d/mirrorlist.

Bedrock's expectation is that non-Bedrock-aware code is not aware of the local vs /bedrock/strata files and just generates things from its its own point of view. With most bootloaders/filesystems that's what happens. In this specific situation it's using grub-mkrelpath, and grub-mkrelpath is aware of possible virtual filesystem layer abstractions. It figures out what things would be from GRUB's point of view at boot time without them.

The fundamental point of confusion here is that Manjaro's grub generation script intends to indicate that the root filesystem should be the target subvolume, but on Bedrock it is indicating that Manjaro's local root filesystem is the target subvolume, which the vfs-abstraction-aware grub-mkrelpath translates away from what GRUB would see as the root filesystem.

While I typically use ext4, I know a number of other users have reported success with btrfs and zfs in the past (provided they do not create per-stratum subvolumes). It is not clear to me why they did not run into this issue.

In the immediate future, my plan is to attempt to add detection for this issue at hijack time and abort accordingly. I will likely include this in the next (non-beta) Bedrock update.

It is not immediately obvious to me how to fix this properly. I'll spend some time thinking about it, but I make no guarantees about when I'll have a fix, if ever. I would not recommend planning assuming a fix is on the way.

shadowrylander commented 4 years ago

Of course; that makes sense! Thanks for the information and help, and take your time! At the moment, bedrock isn't particularly useful to my workflow, but wouldn't been incredible to have just in case!

shadowrylander commented 4 years ago

Oh, and quick question: what if, after installing bedrock, I switch grub.cfg rootflags back to its original value? Would that temporarily solve the issue?

paradigm commented 4 years ago

I would think so, but I haven't actually tested to confirm it.

shadowrylander commented 4 years ago

I might try it; apparently snapd doesn't work properly on the newer Linux kernels, so while waiting for them to update, I might have to use bedrock to install my list of editors. 😅

paradigm commented 4 years ago

As a hack, you can probably open /etc/grub.d/10_linux and change this line:

    rootsubvol="`make_system_path_relative_to_its_root /`"

to

    rootsubvol="@"

That's not a good general solution for Bedrock, but it should suffice for your specific needs so long as /etc/grub.d/10_linux isn't changed in an update. If it is and you don't notice and you find you can't boot, boot off another device, mount the Bedrock partition, and open up <mount>/boot/grub/grub.cfg then change all instances of

rootflags=subvol=@/bedrock/strata/manjarolinux

to

rootflags=subvol=@
shadowrylander commented 4 years ago

Well that's going to be fun to remember. 🤔 I'll try it! Thanks!

shadowrylander commented 4 years ago

A third question: is there any way to prevent bedrock from messing up my grub theme? Since the paths are rewritten, grub can't find the theme text file anymore.

paradigm commented 4 years ago

Looks like that's caused by the same point of confusion around generating the rootflags=subvol= value. I similarly don't have a good fix in mind at this point in time.

shadowrylander commented 4 years ago

I'll wait for an upgrade then, before trying bedrock again! Can't wait! 😻😸