coconut-svsm / svsm

COCONUT-SVSM
MIT License
121 stars 41 forks source link

Kernel panic after Error: "qemu-system-x86_64: warning: memory fault: GPA 0xa0000 size 0x1000 flags 0x8" #319

Open 99Franz opened 7 months ago

99Franz commented 7 months ago

When trying to launch the guest with qemu and a debian image, I get the following kernel panic:

ReadPcr - 05
Supported PCRs - Count = 00000003
GetSupportedAndActivePcrs - Count = 00000003
ReadPcr - HashAlg = 0x0004, Pcr[05], digest = 8B 58 45 22 10 3F 27 DA 18 F3 3C 06 5F 6A A0 DB 58 C7 C8 7B 
ReadPcr - HashAlg = 0x000B, Pcr[05], digest = 5B C2 8B E2 F5 25 5E DF 1B 19 EE 6A 73 A7 98 3E 2B C2 CC FF 41 C9 22 65 70 7A DF 79 E6 3E E1 80 
ReadPcr - HashAlg = 0x000C, Pcr[05], digest = 20 AD 42 97 75 27 BB 8E 14 B1 28 53 D7 89 76 34 33 5C 36 65 FB C8 2A 9F B8 9E CE 02 4C 32 BF 84 D3 1D A8 13 91 B1 CB F6 54 4F 63 1E 4B 91 C4 AF 
SupportedEventLogs - 0x00000003
  LogFormat - 0x00000001
  LogFormat - 0x00000002
CpuDxe: 5-Level Paging = 0
MpInitChangeApLoopCallback() done!
SetUefiImageMemoryAttributes - 0x000000007F2E6000 - 0x0000000000007000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2E0000 - 0x0000000000006000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2D9000 - 0x0000000000007000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2D3000 - 0x0000000000006000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2C3000 - 0x0000000000010000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2BE000 - 0x0000000000005000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2B7000 - 0x0000000000007000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2B3000 - 0x0000000000004000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x000000007F2AE000 - 0x0000000000005000 (0x0000000000000008)
qemu-system-x86_64: warning: memory fault: GPA 0xa0000 size 0x1000 flags 0x8
error: kvm run failed Invalid argument
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 00000000 00000000
CS =0000 00000000 00000000 00000000
SS =0000 00000000 00000000 00000000
DS =0000 00000000 00000000 00000000
FS =0000 00000000 00000000 00000000
GS =0000 00000000 00000000 00000000
LDT=0000 00000000 00000000 00000000
TR =0000 00000000 00000000 00000000
GDT=     0000000000000000 00000000
IDT=     0000000000000000 00000000
CR0=80010033 CR2=0000000000000000 CR3=0000000000000000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=0000000000000000 DR7=0000000000000000
EFER=0000000000000d00
Code=<??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

We followed the steps of the install.md file. The following versions were used, which are the most recent at the time of writing this:

Our AMD SEV-SNP firmware version is 1.55.14 host kernel is on commit bc4de28 IGVM: commit 494aac2 Qemu: commit 896d803 Guest Firmware: commit d965a1b Guest Kernel: bc4de28 COCONUT-SVSM (tested with Debug and release version): commit dec6072

We first thought it was the same bug as in #311, however, we make it past the UEFI, bootloader and into the kernel from what we can see. Do you have any ideas how to fix this?

roy-hopkins commented 7 months ago

I've spent a while trying to reproduce this but it works successfully for me. I see a similar output to you up until the point of the memory fault, then the very next line is early kernel output:

CpuDxe: 5-Level Paging = 0^M
MpInitChangeApLoopCallback() done!^M
SetUefiImageMemoryAttributes - 0x000000007F2E6000 - 0x0000000000007000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2E0000 - 0x0000000000006000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2D9000 - 0x0000000000007000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2D3000 - 0x0000000000006000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2C3000 - 0x0000000000010000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2BE000 - 0x0000000000005000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2B7000 - 0x0000000000007000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2B3000 - 0x0000000000004000 (0x0000000000000008)^M
SetUefiImageMemoryAttributes - 0x000000007F2AE000 - 0x0000000000005000 (0x0000000000000008)^M
[    0.000000][    T0] Linux version 6.8.0-rc3-1-svsm+ (rhopkins@milo) (gcc (SUSE Linux) 13.2.1 20231130 [revision 741743c028dc00f27b9c8b1d5211c1f602f2fddd], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.41.0.20230908-1) #3 SMP PREEMPT_DYNAMIC Wed Feb  7 15:57:08 GMT 2024^M

A couple of questions: 1) How are you launching qemu? Are you using the launch script: scripts/launch_guest.sh? If not, what is your command line. 2) Have you enabled console output for the kernel? Can you try adding this the kernel parameters?

console=tty0 console=ttyS0 earlycon=uart8250,io,0x3f8 earlyprintk=ttyS0
99Franz commented 6 months ago

Thank you for the quick response. We did not use the launch script, but used a similar command:

#!/bin/bash

<path-to-qemu-system-x86_64> \
-name sev-snp-vm,process="sev-snp-vm" \
-enable-kvm \
-cpu EPYC-v4 \
-smp 4 \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,policy=0x30000,igvm-file=<path-to-coconut-qemu.igvm> \
-machine q35,vmport=off,confidential-guest-support=sev0,memory-backend=ram1 \
-object memory-backend-memfd,id=ram1,size=8G,share=true,prealloc=false,reserve=false \
-drive file=base.qcow2,format=qcow2,if=none,id=disk0 \
-device virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=on \
-device scsi-hd,drive=disk0 \
-nographic \
-nodefaults \
-netdev user,id=vmnic,hostfwd=tcp:127.0.0.1:6666-:22 \
-device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
-serial mon:stdio

We fixed it by copying kernel/initrd and kernel commandline out of the VM and directly providing it to QEMU / the UEFI. We added the following lines to our command:

...
-kernel vmlinuz-6.8.0-svsm-guest \
-append 'BOOT_IMAGE=/boot/vmlinuz-6.8.0-svsm-guest root=UUID=<UUID> ro console=tty0 console=ttyS0,115200n8 quiet' \
-initrd initrd.img-6.8.0-svsm-guest

We suppose that the issue might be related to the interaction between GRUB and the TPM, given that manually specifying the 'kernel' and 'initrd' leads to a successful image launch, which skips GRUB. Is this assumption accurate?

deeglaze commented 4 months ago

I think this address specifically is the video memory address that Ashish Kalra is proposing disabling access to in "[PATCH v11 1/3] x86/boot: Skip video memory access in the decompressor for SEV-ES/SNP"