AMDESE / AMDSEV

AMD Secure Encrypted Virtualization
294 stars 85 forks source link

Guest OS launch failed with SEV-SNP. #170

Open jwf777 opened 1 year ago

jwf777 commented 1 year ago

I follow the guide to installed all the related and build SNP kernel successfully. And my host also installed the SNP kernel. But once I launch guest OS with SEV_SNP, it will failed as below: Any help can provide?

./launch-qemu.sh -hda test2.qcow2 -sev-snp

32+0 records in 1+0 records out 512 bytes copied, 0.000236991 s, 2.2 MB/s /Downloads/AMDSEV/snp-release-2023-07-10/usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu EPYC-v4 -machine q35 -smp 4,maxcpus=64 -m 2048M,slots=5,maxmem=30G -no-reboot -drive if=pflash,format=raw,unit=0,file=/Downloads/AMDSEV/snp-release-2023-07-10/usr/local/share/qemu/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=/Downloads/AMDSEV/snp-release-2023-07-10/test2.fd -netdev user,id=vmnic -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= -drive file=/Downloads/AMDSEV/snp-release-2023-07-10/test2.qcow2,if=none,id=disk0,format=qcow2 -device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true -device scsi-hd,drive=disk0 -machine memory-encryption=sev0,vmport=off -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -nographic -monitor pty -monitor unix:monitor,server,nowait Mapping CTRL-C to CTRL-] Launching VM ... /tmp/cmdline.10887 char device redirected to /dev/pts/2 (label compat_monitor0) qemu-system-x86_64: sev_snp_launch_start: SNP_LAUNCH_START ret=-5 fw_error=15 'DF_FLUSH is required' qemu-system-x86_64: sev_kvm_init: failed to create encryption context qemu-system-x86_64: failed to initialize kvm: Operation not permitted

tlendacky commented 1 year ago

Sounds like you may have offlined some CPUs? Are you disabling SMT via a kernel command line parameter?

The SEV firmware won't know that you have done that or when the CPUs could possibly become active again and so it checks every CPU it knows about. If you've offlined some CPUs, then the SNP support in Linux won't do any WBINVDs or DF_FLUSHes on the offline CPUs.

jwf777 commented 1 year ago

I checked BIOS setting, the SMT control is enabled. Also checked the cpu onlne status all cores are online and also check the SMT active and control in linux, they all shows SMT is enabled.

cat /sys/devices/system/cpu/cpu*/online | sort | uniq -1

1

cat /sys/devices/system/cpu/smt/control

on

cat /sys/devices/system/cpu/smt/active

1

Does any other related reason to cause this fail?

tlendacky commented 1 year ago

Can you provide the output of: "dmesg | egrep "SEV|RMP|ccp"

jwf777 commented 1 year ago

Sure, here is the dump:

dmesg | egrep "SEV|RMP|ccp"

[ 9.023077] SEV-SNP: RMP table physical address 0x0000000035500000 - 0x0000000075afffff [ 17.466106] ccp 0000:03:00.5: enabling device (0000 -> 0002) [ 17.491899] ccp 0000:03:00.5: sev enabled [ 17.516346] ccp 0000:03:00.5: psp enabled [ 20.532462] ccp 0000:03:00.5: SEV API:1.55 build:19 [ 20.543305] ccp 0000:03:00.5: SEV-SNP API:1.55 build:19 [ 24.551437] SEV supported: 907 ASIDs [ 24.559374] SEV-ES and SEV-SNP supported: 99 ASIDs

tlendacky commented 1 year ago

I'm not sure what could be going on here. Do you have a firmware file in /lib/firmware/amd/ that is being loaded? If so, can you move it / rename it, so I can see what the base firmware for the system is? Although, I don't see an "SEV firmware update successful" message...

Can you include the output of lscpu?

jwf777 commented 1 year ago

Could you help to explain more about the firmware loaded and move/rename? And about the "SEV firmware update successful" I thought it only need to done in Milan platfom, I am with Genoa platform.

ls /lib/firmware/amd

amd_sev_fam17h_model0xh.sbin.xz amd_sev_fam17h_model3xh.sbin.xz amd_sev_fam19h_model0xh.sbin.xz

lscpu

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 176 On-line CPU(s) list: 0-175 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. Model name: AMD EPYC Processor BIOS Model name: AMD EPYC Processor
Thread(s) per core: 2 Core(s) per socket: 88 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU max MHz: 2973.0000 CPU min MHz: 400.0000 BogoMIPS: 2399.92 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm const ant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cq m rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqmoccup llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq av x512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d sme sev sev_es sev_snp Virtualization features: Virtualization: AMD-V Caches (sum of all):
L1d: 2.8 MiB (88 instances) L1i: 2.8 MiB (88 instances) L2: 88 MiB (88 instances) L3: 176 MiB (11 instances) NUMA:
NUMA node(s): 1 NUMA node0 CPU(s): 0-175 Vulnerabilities:
Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling Srbds: Not affected Tsx async abort: Not affected

tlendacky commented 1 year ago

This is odd, there isn't an 88 core / socket part for Genoa and I would have expected a bit more information about the part on the "Model name:" line from lscpu.

Are you limiting the number of CPUs from the kernel command line? Can you post the output of /proc/cmdline?

Could you help to explain more about the firmware loaded and move/rename? And about the "SEV firmware update successful" I thought it only need to done in Milan platfom, I am with Genoa platform.

There is a version of the Genoa firmware available from the amd.com/sev webpage, but it hasn't been uploaded to linux-firmware, yet. Just making sure you aren't using that.

jwf777 commented 1 year ago

I can post the /proc/cmdline:

cat /proc/cmdline

BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.19.0-rc6-snp-host-c4daeffce56e root=UUID=9b0a4aa9-5544-4b88-bbec-136b95ef1078 ro crashkernel=256M resume=UUID=fd1ee1c0-bbea-4a15-b7b3-94c1a4058a23 biosdevname=0 net.ifnames=0 selinux=0 console=ttyS1,57600n8 ras=cec_disable nopat kvm_amd.sev=1 kvm_amd.sev_es=1 kvm_amd.sev_snp=1 mem_encrypt=on

So, it looks like linux embeded firmware is not available, Ineed to update the sev firmware by myself and download it from amd.com/sev webpage, is that correct?

tlendacky commented 1 year ago

BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.19.0-rc6-snp-host-c4daeffce56e root=UUID=9b0a4aa9-5544-4b88-bbec-136b95ef1078 ro crashkernel=256M resume=UUID=fd1ee1c0-bbea-4a15-b7b3-94c1a4058a23 biosdevname=0 net.ifnames=0 selinux=0 console=ttyS1,57600n8 ras=cec_disable nopat kvm_amd.sev=1 kvm_amd.sev_es=1 kvm_amd.sev_snp=1 mem_encrypt=on

Ok, so I can't understand how you can have 88 cores without limiting it via the command line when an 88 core part does not exist... something is off with this system, especially since lscpu isn't even reporting a model number. I don't think I can really provide any more help at this point.

Also, why do you have "nopat" as part of your kernel command line parameters? I don't think that is necessarily a problem, but probably shouldn't be specified.

So, it looks like linux embeded firmware is not available, Ineed to update the sev firmware by myself and download it from amd.com/sev webpage, is that correct?

No reason to upgrade firmware, I don't think that will matter. You could try to manually add it to /lib/firmware/amd if you'd like.

tlendacky commented 1 year ago

Ok, so I can't understand how you can have 88 cores without limiting it via the command line when an 88 core part does not exist... something is off with this system, especially since lscpu isn't even reporting a model number.

Are you running the host in baremetal or are you by chance running in a VM and trying to launch an SNP guest?