canonical / pc-gadget

The gadget snap for Personal Computers using 64bit Intel or AMD processors
GNU General Public License v3.0
31 stars 73 forks source link

grub.cfg: switch to new style kernel_status failover handling #31

Closed anonymouse64 closed 4 years ago

anonymouse64 commented 4 years ago

This implements the new UC20 spec for loading kernels and failover handling.

Note this is blocked until snapd sets the appropriate kernel_status and extracts the kernel assets to ubuntu-boot partition.

The one open question I had about this is what to do if kernel_status is not "try", "trying", or "". I implemented it such that we default to "" if kernel_status is anything else.

I also renamed grub.cfg-normal to grub.cfg-boot because this grub lives on the ubuntu-boot partition and it's easier to understand which grub corresponds to which partition with this naming scheme I think.

How to test right now:

  1. Build an uc20 image with ubuntu-image and a local build of this gadget snap
  2. Mount the raw image built by ubuntu-image and add a new ubuntu-boot partition to it, make that a ext4 filesystem
  3. Mount the ubuntu-boot partition locally and copy the grub assets, grubx64.efi, bootx64.efi and grub.cfg into $ubuntu-boot/EFI/boot/grubx64.efi, $ubuntu-boot/EFI/boot/bootx64.efi and $ubuntu-boot/EFI/ubuntu/grub.cfg respectively.
  4. Generate a grubenv without any variables defined at $ubuntu-boot/EFI/ubuntu/grubenv to modify later at runtime
  5. Copy a working kernel asset into the ubuntu-boot partition at $ubuntu-boot/kernel.efi from the kernel snap
  6. Put a "broken" kernel in the ubuntu-boot partition at $ubuntu-boot/try-kernel.efi (I just put in a file with "broken" in it there)
  7. Unmount the partition and boot the image with qemu (or similar):
$ kvm -m 2048 -netdev user,id=mynet0,hostfwd=tcp::8022-:22,hostfwd=tcp::8090-:80 -device virtio-net-pci,netdev=mynet0 -bios /usr/share/OVMF/OVMF_CODE.ms.fd -drive if=virtio,file=uc20-try-kernel.img,format=raw -serial mon:stdio -nographic
  1. In the first ubuntu-seed grub, choose "Continue into run mode" option
  2. Let ubuntu-boot grub boot the default kernel for long enough to see that it is generally working
  3. Reboot, and choose "Continue into run mode" option again
  4. Enter the grub command line and run the following to confirm that kernel_status is currently unset and then modify it to "try" and reboot:
grub> load_env --file /EFI/ubuntu/grubenv kernel_status
grub> echo $kernel_status

grub> set kernel_status=try
grub> save_env kernel_status
  1. Choose "Continue into run mode" option in the ubuntu-seed grub
  2. Observe that ubuntu-boot grub fails to boot and reboot the machine:
alloc magic is broken at 0x5c807840: 0
Aborted. Press any key to exit.
  1. Choose "Continue into run mode" option in the ubuntu-seed grub
  2. Enter the grub command line and echo kernel_status to see it has been reset to "":
grub> load_env --file /EFI/ubuntu/grubenv kernel_status
grub> echo $kernel_status

grub>
anonymouse64 commented 4 years ago

@xnox done, applied your suggested changes and rebased on 20.

anonymouse64 commented 4 years ago

Snapd will be ready for this change with https://github.com/snapcore/snapd/pull/7947, so this PR and that one should happen at the same time I think

xnox commented 4 years ago

@anonymouse64 with a small tweak we can de-couple the two.

One can test for presence of the unpacked kernel, if there isn't one, fallback to the "just boot anything one can find on ubuntu-data" which is what we currently have.