canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.28k stars 916 forks source link

Unable to chainboot from network with SecureBoot enabled #8770

Closed ltrager closed 3 years ago

ltrager commented 3 years ago

Required information

Issue description

Machines configured for use with MAAS always boot over the network. When an operating system is deployed MAAS provides a grub.cfg which finds the local bootloader and chainloads it. Previously this caused compatibility issues which caused the chain of trust to be broken. These issues have been fixed in Ubuntu and upstream.

While I have been able to verify SecureBoot is now working with MAAS on physical hardware and libvirt VMs it is still failing with LXD. VMs in LXD either hang or drop to the GRUB shell. We believe this may be due to LXD passing the -no-reboot option to qemu.

Steps to reproduce

  1. Install MAAS master or the soon to be released 3.0.0-rc1
  2. In /var/lib/maas/boot-resources/current or /var/snap/maas/common/maas/boot-resources/current replace bootx64.efi with the latest SHIM from shim-signed_1.47+15.4-0ubuntu2_amd64.deb and replace grubx64.efi with the latest GRUB from grub-efi-amd64-signed_1.169+2.04-1ubuntu45_amd64.deb
  3. Add an LXD VM to MAAS. If the VM is composed make sure the VM config has "security.secureboot": "true"
  4. Deploy Ubuntu 21.04(it contains the same SHIM and GRUB from above)
  5. On reboot, after Curtin has written the image to disk, the VM will either be stuck or drop to a GRUB prompt.

Information to attach

LXD qemu command:

/snap/lxd/20371/bin/qemu-system-x86_64 -S -name lxd-vm -uuid 9c7a808a-99c7-4adc-9edb-ab4f5aed288a -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-reboot -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.log -chroot /var/snap/lxd/common/lxd/virtual-machines/maas_lxd-vm -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

libvirt qemu command:

/usr/bin/qemu-system-x86_64 -name guest=maas-test-4,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-maas-test-4/master-key.aes -blockdev {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE_4M.ms.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/maas-test-4_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine pc-q35-4.0,accel=kvm,usb=off,vmport=off,smm=on,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off -global driver=cfi.pflash01,property=secure,value=on -m 4096 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 79481e6b-c0a6-4e9b-8a70-85f15105bda1 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x11,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x5 -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.5,addr=0x0 -blockdev {"driver":"host_device","filename":"/dev/intel-ssd/maas-test-4-disk-2","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"} -device ide-hd,bus=ide.0,drive=libvirt-2-format,id=sata0-0-0,bootindex=2,write-cache=on -blockdev {"driver":"host_device","filename":"/dev/intel-ssd/maas-test-4-disk-1","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device ide-hd,bus=ide.1,drive=libvirt-1-format,id=sata0-0-1,bootindex=3,write-cache=on -netdev tap,fd=34,id=hostnet0 -device e1000e,netdev=hostnet0,id=net0,mac=52:54:00:e8:52:2b,bus=pci.7,addr=0x0,bootindex=1 -netdev tap,fd=35,id=hostnet1 -device e1000e,netdev=hostnet1,id=net1,mac=52:54:00:ff:75:41,bus=pci.3,addr=0x0 -netdev tap,fd=36,id=hostnet2 -device e1000e,netdev=hostnet2,id=net2,mac=52:54:00:97:44:75,bus=pci.4,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5901,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device intel-hda,id=sound0,bus=pci.2,addr=0x2 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
stgraber commented 3 years ago

Does starting a normal LXD Ubuntu VM outside of MAAS with secureboot enabled works for you?

If so, you should look at edibootmgr and the files in /boot/efi to see what differs.

stgraber commented 3 years ago

I just validated that a hirsute system with:

root@v-latest:~# dpkg -l | grep shim-signed
ii  shim-signed                     1.47+15.4-0ubuntu2                                                   amd64        Secure Boot chain-loading bootloader (Microsoft-signed binary)
root@v-latest:~# dpkg -l | grep grub-efi
ii  grub-efi-amd64                  2.04-1ubuntu45                                                       amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 version)
ii  grub-efi-amd64-bin              2.04-1ubuntu45                                                       amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 modules)
ii  grub-efi-amd64-signed           1.169+2.04-1ubuntu45                                                 amd64        GRand Unified Bootloader, version 2 (EFI-AMD64 version, signed)

Boots without any issue here.

stgraber commented 3 years ago

Trying to use those binaries through MAAS here, I always get to a grub prompt, manually running configfile /grub/grub.cfg gets it going, not sure why it's not doing that automatically in the first place...

Post install, I'm seeing a hang at:

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found
EFI stub: UEFI Secure Boot is enabled.

Then looking at qemu's log, we're seeing a qemu/kvm crash:

KVM internal error. Suberror: 1
emulation failure
RAX=000000007ffa0080 RBX=000000007ff82b10 RCX=000000007e900e18 RDX=00000000000088bd
RSI=000000003da5c94d RDI=000000003ffff1c4 RBP=8000000000000001 RSP=000000007ff829f8
R8 =0000000000000028 R9 =000000007f8ee9e7 R10=000000007ff64000 R11=0000000000000000
R12=0000000000000000 R13=000000007e900e18 R14=000000003da56a2c R15=000000007ff82b00
RIP=00000000000affff RFL=00010a07 [-O---PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     000000007f9ee698 00000047
IDT=     000000007f3da018 00000fff
CR0=80010033 CR2=0000000000000000 CR3=000000007fc01000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d00
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
stgraber commented 3 years ago

Manually booting the VM works fine (ESC to enter firmware, use boot menu and select the ubuntu entry). So this suggests that something in grub's networked codepath is tickling firmware/qemu the wrong way causing a full crash of the emulator.

stgraber commented 3 years ago

@ltrager were you testing with the latest edk2? LXD uses 2021.02 which roughly matches what's in impish. You running an older edk2 may explain a difference in behavior.

ltrager commented 3 years ago

The concern about -no-reboot came about in LP:1865515. My understanding is it may be causing the chain of trust to be reset. Dimitri can explain how -no-reboot may be causing this better than I can. We have been able to verify that the latest SHIM and GRUB do properly support chainloading with the chain of trust preserved on physical hardware as well as libvirt.

To answer your question my libvirt testing was done on Ubuntu 20.10 using packages from main and updates.

ovmf - 2020.05-5ubuntu0.2 qemu - 1:5.0-5ubuntu9.8 libvirt - 6.6.0-1ubuntu3.5

stgraber commented 3 years ago

no-reboot causes qemu to exit when a full system reboot is triggered, that's it. A UEFI fallback to the next boot entry does not trigger it, otherwise we'd never get things booting :)

Given Ubuntu 21.04 as deployed by MAAS boots just fine when you ask the firmware to directly boot from the ubuntu efi boot entry, the actual secureboot setup looks fine. Though the fact that grub always dumps me to a shell requiring me to type in configfile /grub/grub.cfg to get things going suggests there may be some weirdness in MAAS or grub there.

It'd be good for you to test with ovmf 2021.02 so we're dealing with a closer comparison. You also should be testing with virtio-net and not e1000e so that the firmware deals with the same network card in both cases.

ltrager commented 3 years ago

As suggested I installed Ubuntu 21.10 Impish with ovmf 2021.02-1, qemu 1:5.2+dfsg-9ubuntu3, and libvirt 7.0.0-2ubuntu2 and created a VM using virtio net and virtio storage. MAAS is able to deploy Ubuntu with SecureBoot enabled. There are no problems chainloading while keeping the chain of trust.

Is there an easy way to test Dimitri's theory that -no-reboot is causing the issue?

stgraber commented 3 years ago

Sure if that makes you feel better...

root@athos:~# lxc start maas-vm01
root@athos:~# ps aux | grep maas-vm01
root     1752609  0.0  0.0  79892  3396 ?        Ssl  20:21   0:00 /snap/lxd/20383/bin/virtiofsd --socket-path=/var/snap/lxd/common/lxd/logs/maas-vm01/virtio-fs.config.sock -o source=/var/snap/lxd/common/lxd/virtual-machines/maas-vm01/config
lxd      1752898 86.0  0.0 9327240 94736 ?       Sl   20:21   0:01 /snap/lxd/20383/bin/qemu-system-x86_64 -S -name maas-vm01 -uuid 8efadf6a-be79-460a-9b58-7090da696005 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/maas-vm01/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/maas-vm01/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/maas-vm01/qemu.pid -D /var/snap/lxd/common/lxd/logs/maas-vm01/qemu.log -chroot /var/snap/lxd/common/lxd/virtual-machines/maas-vm01 -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
root     1752958  0.0  0.0  79892  2036 ?        Sl   20:21   0:00 /snap/lxd/20383/bin/virtiofsd --socket-path=/var/snap/lxd/common/lxd/logs/maas-vm01/virtio-fs.config.sock -o source=/var/snap/lxd/common/lxd/virtual-machines/maas-vm01/config
root     1753064  0.0  0.0   9040  2472 pts/0    S+   20:21   0:00 grep --color=auto maas-vm01
root@athos:~# lxc console maas-vm01
To detach from the console, press: <ctrl>+a q

>>Start HTTP Boot over IPv4....
  Station IP address is 172.17.16.80

  URI: http://172.17.16.10:5248/images/bootx64.efi
  File Size: 955072 Bytes
  Downloading...100%BdsDxe: loading Boot0005 "UEFI HTTPv4 (MAC:00163ED0BF0C)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163ED0BF0C,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri()
BdsDxe: starting Boot0005 "UEFI HTTPv4 (MAC:00163ED0BF0C)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163ED0BF0C,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri()
error: file `/EFI/ubuntu/x86_64-efi/command.lst' not found.
error: file `/EFI/ubuntu/x86_64-efi/fs.lst' not found.
error: file `/EFI/ubuntu/x86_64-efi/crypto.lst' not found.
error: file `/EFI/ubuntu/x86_64-efi/terminal.lst' not found.
error: file `/EFI/ubuntu/grub.cfg' not found.

                             GNU GRUB  version 2.04

   Minimal BASH-like line editing is supported. For the first word, TAB   
   lists possible command completions. Anywhere else TAB lists possible   
   device or file completions.                                            

grub> 

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found
EFI stub: UEFI Secure Boot is enabled.

root@athos:~# cat /var/snap/lxd/common/lxd/logs/maas-vm01/qemu.log
KVM internal error. Suberror: 1
emulation failure
RAX=000000007ffa0080 RBX=000000007ff82b40 RCX=000000007e8f4f98 RDX=00000000000054fa
RSI=000000003da5c94d RDI=000000003ffff1c4 RBP=8000000000000001 RSP=000000007ff82a28
R8 =0000000000000028 R9 =000000007f8ee9e7 R10=000000007ff64000 R11=0000000000000000
R12=0000000000000000 R13=000000007e8f4f98 R14=000000003da56a2c R15=000000007ff82b30
RIP=00000000000affff RFL=00010a07 [-O---PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     000000007f9ee698 00000047
IDT=     000000007f3da018 00000fff
CR0=80010033 CR2=0000000000000000 CR3=000000007fc01000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d00
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
root@athos:~# 
stgraber commented 3 years ago

So this obviously has absolutely nothing to do with no-reboot.

dannf commented 3 years ago

Using good old print debugging, I've narrowed this down to the kernel calling exit_boot_services() here.

dannf commented 3 years ago

Here's the OVMF debug output leading up to it the crash:

[Bds]Booting UEFI Misc Device
 BlockSize : 512 
 LastBlock : FFFFFF 
 Valid efi partition table header
 Valid efi partition table header
 Valid primary and Valid backup partition table
 Partition entries read block success
 Number of partition entries: 128
 start check partition entries
 End check partition entries
 Index : 0
 Start LBA : 37800
 End LBA : 465FDE
 Partition size: 42E7DF
 Start : 6F00000 End : 8CBFBC00
 Index : 13
 Start LBA : 800
 End LBA : 27FF
 Partition size: 2000
 Start : 100000 End : 4FFE00
 Index : 14
 Start LBA : 2800
 End LBA : 377FF
 Partition size: 35000
 Start : 500000 End : 6EFFE00
Prepare to Free Pool
 BlockSize : 512 
 LastBlock : 42E7DE 
 BlockSize : 512 
 LastBlock : 1FFF 
FSOpen: Open '\EFI\BOOT\BOOTX64.EFI' Success
[Bds] Expand PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0) -> PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/HD(15,GPT,B892B9DA-959A-4C13-9B9C-EDC1C192A820,0x2800,0x35000)/\EFI\BOOT\BOOTX64.EFI
[Security] 3rd party image[0] can be loaded after EndOfDxe: PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/HD(15,GPT,B892B9DA-959A-4C13-9B9C-EDC1C192A820,0x2800,0x35000)/\EFI\BOOT\BOOTX64.EFI.
DxeImageVerification: MeasureVariable (Pcr - 7, EventType - 800000E0, VariableName - db, VendorGuid - D719B2CB-3D3A-4596-A3BC-DAD00E67656F)
MeasureBootPolicyVariable - Not Found
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7CC53340
Loading driver at 0x0007CB61000 EntryPoint=0x0007CB83000 
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7D304998
ProtectUefiImageCommon - 0x7CC53340
  - 0x000000007CB61000 - 0x00000000000D3000
[Variable]: Rewritten a preexisting variable(0x00000006) with different attributes(0x00000007) - 605DAB50-E046-4300-ABB6-3DD810DD8B23:MokListRT
[Variable]: Rewritten a preexisting variable(0x00000006) with different attributes(0x00000007) - 605DAB50-E046-4300-ABB6-3DD810DD8B23:MokListXRT
[Variable]: Rewritten a preexisting variable(0x00000006) with different attributes(0x00000007) - 605DAB50-E046-4300-ABB6-3DD810DD8B23:SbatLevelRT
InstallProtocolInterface: 605DAB50-E046-4300-ABB6-3DD810DD8B23 7CC10160
FSOpen: Open '\EFI\BOOT\fbx64.efi' Success
FSOpen: Open '\EFI\BOOT\fbx64.efi' Success
FSOpen: Open 'EFI' Success
FSOpen: Open 'ubuntu' Success
FSOpen: Open 'BOOTX64.CSV' Success
FSOpen: Open '\EFI\ubuntu\BOOTX64.CSV' Success
FSOpen: Open '\EFI\ubuntu\shimx64.efi' Success
[Security] 3rd party image[0] can be loaded after EndOfDxe: PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/HD(15,GPT,B892B9DA-959A-4C13-9B9C-EDC1C192A820,0x2800,0x35000)/\EFI\ubuntu\shimx64.efi.
DxeImageVerification: MeasureVariable (Pcr - 7, EventType - 800000E0, VariableName - db, VendorGuid - D719B2CB-3D3A-4596-A3BC-DAD00E67656F)
MeasureBootPolicyVariable - Not Found
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7CCCA040
Loading driver at 0x0007C9A4000 EntryPoint=0x0007C9C6000 
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7CC4A818
ProtectUefiImageCommon - 0x7CCCA040
  - 0x000000007C9A4000 - 0x00000000000D3000
[Variable]: Rewritten a preexisting variable(0x00000006) with different attributes(0x00000007) - 605DAB50-E046-4300-ABB6-3DD810DD8B23:MokListRT
[Variable]: Rewritten a preexisting variable(0x00000006) with different attributes(0x00000007) - 605DAB50-E046-4300-ABB6-3DD810DD8B23:MokListXRT
[Variable]: Rewritten a preexisting variable(0x00000006) with different attributes(0x00000007) - 605DAB50-E046-4300-ABB6-3DD810DD8B23:SbatLevelRT
InstallProtocolInterface: 605DAB50-E046-4300-ABB6-3DD810DD8B23 7CA53160
FSOpen: Open '\EFI\ubuntu\grubx64.efi' Success
ConvertPages: range 1000000 - 43A4FFF covers multiple entries
ConvertPages: range 1000000 - 43A4FFF covers multiple entries
stgraber commented 3 years ago

Ok, so at least it's a firmware call that crashes everything, so slightly less concerning than just the kernel doing it during normal initialization.

Is there any way to get debugging on the firmware side when it handles ExitBootServices?

Given it only happens on first boot (or at least looked that way here) and that's when we setup the boot NVRAM entries, I suspect it's got something to do with ExitBootServices applying the variable restrictions (so you can't write to the firmware-only ones)?

dannf commented 3 years ago

I'm hoping to learn more tomorrow by attaching gdb to the VM.

dannf commented 3 years ago

I hooked up gdb to the VM and was able to load the ovmf symbols. I setup a breakpoint for CoreExitBootServices(), but the crash happened before that breakpoint. So, best I can tell we're crashing somewhere in the glue (thunk?) between the EFI stub and the firmware.

Since the address to the invalid instruction always seems to be the same (0xff96), I set a break there instead. I /did/ hit that breakpoint after the EFI stub's call to ExitBootServices() and before the Invalid Opcode Exception. This allowed me to collect the following stacktrace, though I don't know how to associate it with the debug symbols from the stub:

(gdb) bt
#0  0x000000000000ff96 in ?? ()
#1  0x000000003da40cbf in ?? ()
#2  0x8000000000000001 in ?? ()
#3  0x0000000000000000 in ?? ()

GRUB debug tells me it loaded the kernel at nearby/lower address:

loader/efi/linux.c:96: kernel_addr: 0x3cc55000 handover_offset: 0x190 params:
0x3ffff000

So it seems plausible that we're still executing somewhere in the stub. And the instruction before the return address in the stack 0x000000003da40cbf in memory does happen to be a call:

   0x3da40caf:  sub    $0x20,%rsp
   0x3da40cb3:  mov    (%rax),%rdx
   0x3da40cb6:  mov    %r13,%rcx
   0x3da40cb9:  call   *0xe8(%rsi) <<< This must be where things are going bad
   0x3da40cbf:  mov    %rax,%r12   <<< Because this is the return address
   0x3da40cc2:  add    $0x20,%rsp
   0x3da40cc6:  jmp    0x3da40ce1
   0x3da40cc8:  mov    0x3c(%rdx),%edx
   0x3da40ccb:  mov    %r13,%rsi
   0x3da40cce:  mov    0x80(%rdx),%edi
   0x3da40cd4:  mov    (%rax),%rdx
   0x3da40cd7:  xor    %eax,%eax
   0x3da40cd9:  call   0x3da402e0
   0x3da40cde:  mov    %rax,%r12
   0x3da40ce1:  mov    %r12,%rsi
   0x3da40ce4:  lea    0x9018(%rip),%rdi        # 0x3da49d03
   0x3da40ceb:  xor    %eax,%eax
   0x3da40ced:  call   0x3da40527
   0x3da40cf2:  movabs $0x8000000000000002,%rax

I can find this same code in the disassembly of the EFI stub:

 1debaaf:       48 83 ec 20             sub    $0x20,%rsp
 1debab3:       48 8b 10                mov    (%rax),%rdx
 1debab6:       4c 89 e9                mov    %r13,%rcx
 1debab9:       ff 96 e8 00 00 00       call   *0xe8(%rsi)
 1debabf:       49 89 c4                mov    %rax,%r12
 1debac2:       48 83 c4 20             add    $0x20,%rsp
 1debac6:       eb 19                   jmp    0x1debae1
 1debac8:       8b 52 3c                mov    0x3c(%rdx),%edx
 1debacb:       4c 89 ee                mov    %r13,%rsi
 1debace:       8b ba 80 00 00 00       mov    0x80(%rdx),%edi
 1debad4:       48 8b 10                mov    (%rax),%rdx
 1debad7:       31 c0                   xor    %eax,%eax
 1debad9:       e8 02 f6 ff ff          call   0x1deb0e0
 1debade:       49 89 c4                mov    %rax,%r12
 1debae1:       4c 89 e6                mov    %r12,%rsi
 1debae4:       48 8d 3d 18 90 00 00    lea    0x9018(%rip),%rdi        # 0x1df4b03
 1debaeb:       31 c0                   xor    %eax,%eax
 1debaed:       e8 35 f8 ff ff          call   0x1deb327
 1debaf2:       48 b8 02 00 00 00 00    movabs $0x8000000000000002,%rax

And finally, I was able to locate the same code in the disassembly of the efi-stub-helper.o object in my build tree, which has debug symbols that show this code to be in efi_exit_boot_services(). So it doesn't seem like we can confidently point the finger at OVMF just yet - though I'm not entirely sure how to get to root cause.

dannf commented 3 years ago

Here's the disassembly of efi_exit_boot_services() in it's entirety - I believe the crashing line is at 0x8fa:


00000000000007f4 <efi_exit_boot_services>:
     7f4:       f3 0f 1e fa             endbr64 
     7f8:       41 57                   push   %r15
     7fa:       31 c0                   xor    %eax,%eax
     7fc:       49 89 d7                mov    %rdx,%r15
     7ff:       41 56                   push   %r14
     801:       49 89 ce                mov    %rcx,%r14
     804:       41 55                   push   %r13
     806:       49 89 fd                mov    %rdi,%r13
     809:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 810 <efi_exit_boot_services+0x1c>
     810:       41 54                   push   %r12
     812:       55                      push   %rbp
     813:       53                      push   %rbx
     814:       48 89 f3                mov    %rsi,%rbx
     817:       41 50                   push   %r8
     819:       e8 00 00 00 00          call   81e <efi_exit_boot_services+0x2a>
     81e:       48 89 df                mov    %rbx,%rdi
     821:       e8 00 00 00 00          call   826 <efi_exit_boot_services+0x32>
     826:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 82d <efi_exit_boot_services+0x39>
     82d:       49 89 c4                mov    %rax,%r12
     830:       48 89 c6                mov    %rax,%rsi
     833:       31 c0                   xor    %eax,%eax
     835:       e8 00 00 00 00          call   83a <efi_exit_boot_services+0x46>
     83a:       4d 85 e4                test   %r12,%r12
     83d:       0f 85 25 02 00 00       jne    a68 <efi_exit_boot_services+0x274>
     843:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 84a <efi_exit_boot_services+0x56>
     84a:       31 c0                   xor    %eax,%eax
     84c:       e8 00 00 00 00          call   851 <efi_exit_boot_services+0x5d>
     851:       4c 89 fe                mov    %r15,%rsi
     854:       48 89 df                mov    %rbx,%rdi
     857:       41 ff d6                call   *%r14
     85a:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 861 <efi_exit_boot_services+0x6d>
     861:       48 89 c6                mov    %rax,%rsi
     864:       49 89 c4                mov    %rax,%r12
     867:       31 c0                   xor    %eax,%eax
     869:       e8 00 00 00 00          call   86e <efi_exit_boot_services+0x7a>
     86e:       40 8a 2d 00 00 00 00    mov    0x0(%rip),%bpl        # 875 <efi_exit_boot_services+0x81>
     875:       4d 85 e4                test   %r12,%r12
     878:       74 38                   je     8b2 <efi_exit_boot_services+0xbe>
     87a:       31 c0                   xor    %eax,%eax
     87c:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 883 <efi_exit_boot_services+0x8f>
     883:       e8 00 00 00 00          call   888 <efi_exit_boot_services+0x94>
     888:       40 84 ed                test   %bpl,%bpl
     88b:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 892 <efi_exit_boot_services+0x9e>
     892:       48 8b 03                mov    (%rbx),%rax
     895:       0f 84 af 01 00 00       je     a4a <efi_exit_boot_services+0x256>
     89b:       48 8b 52 60             mov    0x60(%rdx),%rdx
     89f:       48 83 ec 20             sub    $0x20,%rsp
     8a3:       48 8b 08                mov    (%rax),%rcx
     8a6:       ff 52 48                call   *0x48(%rdx)
     8a9:       48 83 c4 20             add    $0x20,%rsp
     8ad:       e9 a8 01 00 00          jmp    a5a <efi_exit_boot_services+0x266>
     8b2:       80 3d 00 00 00 00 00    cmpb   $0x0,0x0(%rip)        # 8b9 <efi_exit_boot_services+0xc5>
     8b9:       74 13                   je     8ce <efi_exit_boot_services+0xda>
     8bb:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 8c2 <efi_exit_boot_services+0xce>
     8c2:       31 c0                   xor    %eax,%eax
     8c4:       e8 00 00 00 00          call   8c9 <efi_exit_boot_services+0xd5>
     8c9:       e8 00 00 00 00          call   8ce <efi_exit_boot_services+0xda>
     8ce:       31 c0                   xor    %eax,%eax
     8d0:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 8d7 <efi_exit_boot_services+0xe3>
     8d7:       e8 00 00 00 00          call   8dc <efi_exit_boot_services+0xe8>
     8dc:       40 84 ed                test   %bpl,%bpl
     8df:       48 8b 43 20             mov    0x20(%rbx),%rax
     8e3:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 8ea <efi_exit_boot_services+0xf6>
     8ea:       74 1d                   je     909 <efi_exit_boot_services+0x115>
     8ec:       48 8b 72 60             mov    0x60(%rdx),%rsi
     8f0:       48 83 ec 20             sub    $0x20,%rsp
     8f4:       48 8b 10                mov    (%rax),%rdx
     8f7:       4c 89 e9                mov    %r13,%rcx
>     8fa:       ff 96 e8 00 00 00       call   *0xe8(%rsi)
     900:       49 89 c4                mov    %rax,%r12
     903:       48 83 c4 20             add    $0x20,%rsp
     907:       eb 19                   jmp    922 <efi_exit_boot_services+0x12e>
     909:       8b 52 3c                mov    0x3c(%rdx),%edx
     90c:       4c 89 ee                mov    %r13,%rsi
     90f:       8b ba 80 00 00 00       mov    0x80(%rdx),%edi
     915:       48 8b 10                mov    (%rax),%rdx
     918:       31 c0                   xor    %eax,%eax
     91a:       e8 00 00 00 00          call   91f <efi_exit_boot_services+0x12b>
     91f:       49 89 c4                mov    %rax,%r12
     922:       4c 89 e6                mov    %r12,%rsi
     925:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 92c <efi_exit_boot_services+0x138>
     92c:       31 c0                   xor    %eax,%eax
     92e:       e8 00 00 00 00          call   933 <efi_exit_boot_services+0x13f>
     933:       48 b8 02 00 00 00 00    movabs $0x8000000000000002,%rax
     93a:       00 00 80 
     93d:       49 39 c4                cmp    %rax,%r12
     940:       0f 85 22 01 00 00       jne    a68 <efi_exit_boot_services+0x274>
     946:       48 8b 53 28             mov    0x28(%rbx),%rdx
     94a:       48 8b 43 08             mov    0x8(%rbx),%rax
     94e:       40 84 ed                test   %bpl,%bpl
     951:       48 8b 12                mov    (%rdx),%rdx
     954:       48 89 10                mov    %rdx,(%rax)
     957:       4c 8b 43 10             mov    0x10(%rbx),%r8
     95b:       74 30                   je     98d <efi_exit_boot_services+0x199>
     95d:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 964 <efi_exit_boot_services+0x170>
     964:       48 8b 13                mov    (%rbx),%rdx
     967:       56                      push   %rsi
:
     968:       4d 89 c1                mov    %r8,%r9
     96b:       48 8b 4b 08             mov    0x8(%rbx),%rcx
     96f:       4c 8b 43 20             mov    0x20(%rbx),%r8
     973:       48 8b 40 60             mov    0x60(%rax),%rax
     977:       ff 73 18                push   0x18(%rbx)
     97a:       48 8b 12                mov    (%rdx),%rdx
     97d:       48 83 ec 20             sub    $0x20,%rsp
     981:       ff 50 38                call   *0x38(%rax)
     984:       49 89 c4                mov    %rax,%r12
     987:       48 83 c4 30             add    $0x30,%rsp
     98b:       eb 38                   jmp    9c5 <efi_exit_boot_services+0x1d1>
     98d:       4c 8b 4b 18             mov    0x18(%rbx),%r9
     991:       41 c7 40 04 00 00 00    movl   $0x0,0x4(%r8)
     998:       00 
     999:       48 8b 4b 20             mov    0x20(%rbx),%rcx
     99d:       c7 41 04 00 00 00 00    movl   $0x0,0x4(%rcx)
     9a4:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 9ab <efi_exit_boot_services+0x1b7>
     9ab:       48 8b 03                mov    (%rbx),%rax
     9ae:       48 8b 73 08             mov    0x8(%rbx),%rsi
     9b2:       8b 52 3c                mov    0x3c(%rdx),%edx
     9b5:       8b 7a 28                mov    0x28(%rdx),%edi
     9b8:       48 8b 10                mov    (%rax),%rdx
     9bb:       31 c0                   xor    %eax,%eax
     9bd:       e8 00 00 00 00          call   9c2 <efi_exit_boot_services+0x1ce>
     9c2:       49 89 c4                mov    %rax,%r12
     9c5:       4d 85 e4                test   %r12,%r12
     9c8:       0f 85 9a 00 00 00       jne    a68 <efi_exit_boot_services+0x274>
     9ce:       4c 89 fe                mov    %r15,%rsi
     9d1:       48 89 df                mov    %rbx,%rdi
     9d4:       41 ff d6                call   *%r14
     9d7:       49 89 c4                mov    %rax,%r12
     9da:       48 85 c0                test   %rax,%rax
     9dd:       0f 85 85 00 00 00       jne    a68 <efi_exit_boot_services+0x274>
     9e3:       31 c0                   xor    %eax,%eax
     9e5:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 9ec <efi_exit_boot_services+0x1f8>
     9ec:       e8 00 00 00 00          call   9f1 <efi_exit_boot_services+0x1fd>
     9f1:       40 84 ed                test   %bpl,%bpl
     9f4:       48 8b 43 20             mov    0x20(%rbx),%rax
     9f8:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 9ff <efi_exit_boot_services+0x20b>
     9ff:       74 1d                   je     a1e <efi_exit_boot_services+0x22a>
     a01:       48 8b 72 60             mov    0x60(%rdx),%rsi
     a05:       48 83 ec 20             sub    $0x20,%rsp
     a09:       48 8b 10                mov    (%rax),%rdx
     a0c:       4c 89 e9                mov    %r13,%rcx
     a0f:       ff 96 e8 00 00 00       call   *0xe8(%rsi)
     a15:       49 89 c4                mov    %rax,%r12
     a18:       48 83 c4 20             add    $0x20,%rsp
     a1c:       eb 19                   jmp    a37 <efi_exit_boot_services+0x243>
     a1e:       8b 52 3c                mov    0x3c(%rdx),%edx
     a21:       4c 89 ee                mov    %r13,%rsi
     a24:       8b ba 80 00 00 00       mov    0x80(%rdx),%edi
     a2a:       48 8b 10                mov    (%rax),%rdx
     a2d:       31 c0                   xor    %eax,%eax
     a2f:       e8 00 00 00 00          call   a34 <efi_exit_boot_services+0x240>
     a34:       49 89 c4                mov    %rax,%r12
     a37:       4c 89 e6                mov    %r12,%rsi
     a3a:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # a41 <efi_exit_boot_services+0x24d>
     a41:       31 c0                   xor    %eax,%eax
     a43:       e8 00 00 00 00          call   a48 <efi_exit_boot_services+0x254>
     a48:       eb 1e                   jmp    a68 <efi_exit_boot_services+0x274>
     a4a:       8b 52 3c                mov    0x3c(%rdx),%edx
     a4d:       48 8b 30                mov    (%rax),%rsi
     a50:       31 c0                   xor    %eax,%eax
     a52:       8b 7a 30                mov    0x30(%rdx),%edi
     a55:       e8 00 00 00 00          call   a5a <efi_exit_boot_services+0x266>
     a5a:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # a61 <efi_exit_boot_services+0x26d>
     a61:       31 c0                   xor    %eax,%eax
     a63:       e8 00 00 00 00          call   a68 <efi_exit_boot_services+0x274>
     a68:       5a                      pop    %rdx
     a69:       4c 89 e0                mov    %r12,%rax
     a6c:       5b                      pop    %rbx
     a6d:       5d                      pop    %rbp
     a6e:       41 5c                   pop    %r12
     a70:       41 5d                   pop    %r13
     a72:       41 5e                   pop    %r14
     a74:       41 5f                   pop    %r15
     a76:       c3                      ret    
dannf commented 3 years ago

@sforshee and I were able to make a bit more progress. We found that the crash is occuring because the ExitBootServices() callback in the Boot Services Table is NULL. It's not NULL at boot, so we added a watchpoint to see what was clearing it. Turns out it is shim (15.4-0ubuntu2):

(gdb) bt
#0  unhook_system_services () at replacements.c:49
#1  0x000000007cb8912b in load_image (BootPolicy=<optimized out>, ParentImageHandle=<optimized out>, DevicePath=<optimized out>, 
    SourceBuffer=<optimized out>, SourceSize=0, ImageHandle=0x7ef5d930) at replacements.c:59
#2  0x000000007cd00260 in ?? ()
#3  0x000000007ccf9518 in ?? ()
#4  0x000000007d303b18 in ?? ()
#5  0x000000007ccf4498 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) 

@sforshee suggested the following patch to shim:

diff --git a/replacements.c b/replacements.c
index 278a8e7848e4..072da428da46 100644
--- a/replacements.c
+++ b/replacements.c
@@ -33,7 +33,9 @@ get_active_systab(void)
 static typeof(systab->BootServices->LoadImage) system_load_image;
 static typeof(systab->BootServices->StartImage) system_start_image;
 static typeof(systab->BootServices->Exit) system_exit;
+#if !defined(DISABLE_EBS_PROTECTION)
 static typeof(systab->BootServices->ExitBootServices) system_exit_boot_services;
+#endif

 static EFI_HANDLE last_loaded_image;

@@ -45,7 +47,9 @@ unhook_system_services(void)

    systab->BootServices->LoadImage = system_load_image;
    systab->BootServices->StartImage = system_start_image;
+#if !defined(DISABLE_EBS_PROTECTION)
    systab->BootServices->ExitBootServices = system_exit_boot_services;
+#endif
    gBS = systab->BootServices;
 }

This does prevent shim from clearing ExitBootServices - now that watchpoint doesn't get tripped until OVMF runs InternalMemZeroMem(), which I suppose is expected. However, we still don't seem to be surviving the call to exit_boot_services. Now we're crashing in a new different way:

EFI stub: Actually calling exit_boot_services
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000  I:0 R:0 U:0 W:0 P:0 PK:0 SS:0 SGX:0
RIP  - 000000000000FF91, CS  - 0000000000000038, RFLAGS - 0000000000200003
RAX  - 000000000000F055, RCX - 0000000000000000, RDX - 00000000000000EB
RBX  - 000000003DA49AEB, RSP - 000000007EF64EB8, RBP - 000000007EF64F00
RSI  - 000000000000000A, RDI - 0000000000000000
R8   - 000000008FE6ED20, R9  - 0000000000000000, R10 - 000000007DA01508
R11  - 0000000000000001, R12 - 000000003DA49D03, R13 - 0000000000000000
R14  - 000000003DA4385C, R15 - 000000007EF65270
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - FFFFFFFFFFFFFFFF, CR3 - 000000007EC01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007E9E3000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007E163018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000007EF64B10
!!!! Can't find image information. !!!!
dannf commented 3 years ago

Turns out that was my fault - I had sprinkled efi_info() calls throughout, not realizing that facility goes away with ExitBootServices. Using a non-instrumented Ubuntu kernel, I can now get to a prompt in KVM mode. In QEMU emulation mode it fails w/ complains about BIOS corruption - but that may be unrelated.

sforshee commented 3 years ago

PR for shim: https://github.com/rhboot/shim/pull/378

stgraber commented 3 years ago

Closing this one on our side as we've now confirmed the issue is coming from the shim and not from anything LXD is doing (other than us using UEFI+SB by default).