canonical / pc-gadget

The gadget snap for Personal Computers using 64bit Intel or AMD processors
GNU General Public License v3.0
31 stars 73 forks source link

Broken kernel.efi does not reboot automatically #36

Open anonymouse64 opened 4 years ago

anonymouse64 commented 4 years ago

If I intentionally break the kernel.efi[1] in ubuntu-boot on a UC20 system, then grub does not reboot the system and instead hangs with the following message:

double free at 0x8c6f7d60
Aborted. Press any key to exit.

Additionally, pressing a key here (at least in QEMU) will drop me to the UEFI menu, where I would then have to manually reboot.

It would be great if grub could reboot automatically in this situation, because this may happen in the wild with a bad kernel snap update (where try-kernel.efi is corrupted like this) and then, if the system were rebooted, the grub.cfg on ubuntu-boot would trigger a rollback to the previous kernel.efi as without user intervention.

[1] I ran if=/dev/zero of=kernel.efi count=1

alfonsosanchezbeato commented 3 years ago

@anonymouse64 I have taken a look at this, but couldn't reproduce exactly. If I create a kernel image with dd if=/dev/zero of=kernel.efi count=1 (512 null bytes), the error I get is

 Failed to boot both default and fallback entries.

Press any key to continue...

and then I get to the grub menu, where I need to force a reboot to move forward and get the kernel reverted. To me, the double free seems like a grub bug while doing something with kernel.efi, that would need to be solved first, and then we would probably get the "Failed to boot both..." error. It would be nice if you can share the kernel.efi that triggered the issue.

Anyway, I tried grub's fallback feature with this grub configuration: https://gist.github.com/alfonsosanchezbeato/8ad42596cd06a5986a9aea16c815f841 which seems to work. When trying the kernel.efi created by dd, it is able to boot the old kernel without interactions with the user (grub menu is not shown), and the new kernel snap is reverted as expected. I think we can have this change in snapd without backwards compatibility problems. More sophisticated things that do not apply only when updating the kernel would need a bit more of thought, as you mentioned.

Probably we should go through some cases with bad kernel images and give that some testing in the spread tests:

  1. Empty kernel.efi
  2. Corrupted kernel.efi (bad PE+ header)
  3. kernel.efi with good header but bad systemd stub
  4. kernel.efi with good systemd stub but corrupted/empty kernel section
  5. Unsigned kernel.efi

and probably some more...

anonymouse64 commented 3 years ago

@alfonsosanchezbeato thanks for testing that, the grub.cfg you posted looks good to me.

I don't have the original kernel.efi which triggered that double free anymore, I don't recall what I did to cause that, I may have just copied a junk file to the kernel.efi rather than all zeros like the dd command.

All of those cases would be good, we do have this existing spread test which could be modified to handle these additional cases I think: https://github.com/snapcore/snapd/blob/master/tests/nested/core/core20-kernel-failover/task.yaml

Though it is worth wondering whether the full test situations should live in snapd repo or somewhere else... as it seems all those other variants are more about grub's behavior than snapd's, but then again we do have grub.cfg inside snapd now so I suppose it makes sense.