AMDESE / AMDSEV

AMD Secure Encrypted Virtualization
272 stars 84 forks source link

VFIO-PCI not working with AMD SEV + DPDK #185

Open JETtech-Labs opened 9 months ago

JETtech-Labs commented 9 months ago

I am trying to get an AMD SEV guest VM to work with DPDK (userspace) using vfio-pci drivers with an SR-IOV Network interface card. From my reading it appears others have been able to get vfio-pci passthrough to work with AMD SEV (and GPUs) but so far when I try to use DPDK inside the guest it fails at the very first message sent via mmap'ed device I/O.

My VM without AMD SEV enabled works great with vfio-pci and DPDK (so I know this is an AMD SEV issues). I am able to use the native kernel iavf driver directly w/ AMD SEV in the guest. So it seems the "nested" use of vfio-pci + AMD SEV may be the problem. By nested I mean that in the host machine I assign vfio-pci driver to the NIC virtual function that I am passing to the guest VM. Then inside the Guest VM I assign vfio-pci driver to the PCI device. Then DPDK attempts to initialize this device and I get the errors (below).

I have tried both with IOMMU enabled and disabled in the guest and both fail to read from the PCIe NIC device. DPDK uses mmap to map device HW address (shared flag is set). Is this the right way to map memory for AMD SEV guests? Does DPDK need to be modified to use some other method of I/O memory mapping? Is there something I am missing?

Setup: Host Linux Kernel v 6.2.0 QEMU version: v 8.1.0 Guest Kernel v 6.5.0

Below is the DPDK log:

2023/09/18 21:10:27:980 notice dpdk EAL: PCI device 0000:00:04.0 on NUMA socket -1 2023/09/18 21:10:27:980 notice dpdk EAL: probe driver: 8086:1889 net_iavf 2023/09/18 21:10:27:980 notice dpdk EAL: Set IOMMU type 1 (Type 1) failed, error 19 (No such device) 2023/09/18 21:10:27:980 notice dpdk EAL: Set IOMMU type 7 (sPAPR) failed, error 19 (No such device) 2023/09/18 21:10:27:980 notice dpdk EAL: Using IOMMU type 8 (No-IOMMU) 2023/09/18 21:10:27:980 notice dpdk EAL: Mem event callback 'vfio_mem_event_clb:(nil)' registered 2023/09/18 21:10:27:980 notice dpdk EAL: Installed memory event callback for VFIO 2023/09/18 21:10:27:980 notice dpdk EAL: VFIO reports MSI-X BAR as mappable 2023/09/18 21:10:27:980 notice dpdk EAL: PCI memory mapped at 0x7f48c0024000 2023/09/18 21:10:27:980 notice dpdk EAL: PCI memory mapped at 0x7f48c0044000 2023/09/18 21:10:27:980 notice dpdk EAL: Probe PCI driver: net_iavf (8086:1889) device: 0000:00:04.0 (socket -1) 2023/09/18 21:10:27:980 notice dpdk iavf_execute_vf_cmd(): No response or return failure (0) for cmd 1 2023/09/18 21:10:27:980 notice dpdk iavf_check_api_version(): Fail to execute command of OP_VERSION 2023/09/18 21:10:27:980 notice dpdk iavf_init_vf(): check_api version failed 2023/09/18 21:10:27:980 notice dpdk iavf_dev_init(): Init vf failed 2023/09/18 21:10:27:980 notice dpdk EAL: Mem event callback 'vfio_mem_event_clb:(nil)' unregistered 2023/09/18 21:10:27:980 notice dpdk EAL: Releasing PCI mapped resource for 0000:00:04.0

JETtech-Labs commented 9 months ago

A little more about my setup: CPU: AMD EPYC 9124 EFI -> SEV SNP is disabled

sevctl output: sudo ./target/debug/sevctl ok es [ PASS ] - AMD CPU [ PASS ] - Microcode support [ PASS ] - Secure Memory Encryption (SME) [ PASS ] - Secure Encrypted Virtualization (SEV) [ PASS ] - Encrypted State (SEV-ES) [ SKIP ] - Secure Nested Paging (SEV-SNP) [ SKIP ] - VM Permission Levels [ SKIP ] - Number of VMPLs [ PASS ] - Physical address bit reduction: 6 [ PASS ] - C-bit location: 51 [ PASS ] - Number of encrypted guests supported simultaneously: 1006 [ PASS ] - Minimum ASID value for SEV-enabled, SEV-ES disabled guest: 16 [ PASS ] - SEV enabled in KVM: enabled [ PASS ] - SEV-ES enabled in KVM: enabled [ PASS ] - Reading /dev/sev: /dev/sev readable [ PASS ] - Writing /dev/sev: /dev/sev writable [ PASS ] - Page flush MSR: DISABLED [ PASS ] - KVM supported: API version: 12 [ PASS ] - Memlock resource limit: Soft: 16851439616 | Hard: 16851439616

zvonkok commented 9 months ago

The hardware needs to be confidential-compute capable. You cannot attach just "any" HW and expect that it works. DMAs are highly untrusted in a TEE environment and hence any PCIe device will fail. The GPU and other confidential-compute capable devices are using bounce-buffers until we have TDISP/IDE support. This means the whole stack needs to support confidential-compute capabilities, starting from the HW, firmware, driver, and workloads. Bounce-buffers are a region outside of the private memory of the VM. The HW/firmware/driver needs to have the functionality to create encrypted transfers from and to the TEE. How the keys for encryption of DMA transfers are handled is up the the vendor.

JETtech-Labs commented 9 months ago

@zvonkok I am not trying to have the NIC be part of the the trusted environment - I am just trying to get DMA to work from userspace of the AMD SEV guest. Do you know if this is possible with the existing code? Maybe I am just missing some setting or something in my QEMU setup - or maybe SEV requires an IOMMU - in which case I may need to use the QEMU version that has "amd-iommu" as an option.

My first step is to just get AMD SEV working with PCI passthrough to userspace (even if that means unsafely bypassing IOMMU), then I will tackle the SEV-ES and maybe even SNP (once mainline kernel/QEMU support it).

tlendacky commented 9 months ago

DPDK is currently not supported under SEV. The main reason is that (almost) all userspace memory accesses are mapped as encrypted and DMA must be performed to unencrypted memory.

JETtech-Labs commented 9 months ago

@tlendacky Thanks for the quick reply - Are there any existing kernel tools to tell which pages are marked as encrypted (c-bit set) - I don't see them in tools i am familiar with (/proc/pid/maps + /proc/pagemaps/ + /proc/kpageflags ) don't seem to indicate if the physical address has the c-bit set. What is the best way to see if the guest kernel has marked a page as encrypted?

tlendacky commented 9 months ago

Are there any existing kernel tools to tell which pages are marked as encrypted

There aren't any tools that I'm aware of. Typically the kernel will issue a set_memory_decrypted() call to change the memory from encrypted to unencrypted, but there is nothing tracking that it is mapped that way.

It is possible that maybe something like mmap() or related calls could be updated to generate userspace mappings with the c-bit based on some flag or the kernel state, but that hasn't been looked at or whether that is feasible/secure.

JETtech-Labs commented 9 months ago

@tlendacky So the only way to know if a page is encrypted is to dump the physical memory (from the host machine?) and see that the contents look like encrypted data?

Is there a plan to add something to the /sys/kernel/debug/tracing or the /proc/pagemaps to indicate which page is encrypted - it would be nice for the guest kernel to be able to verify that the pages it thinks are encrypted actually are.

JETtech-Labs commented 9 months ago

@tlendacky Is there a way to reach out to you directly? I have some more details and questions i would like to discuss

Thanks

tlendacky commented 9 months ago

@tlendacky So the only way to know if a page is encrypted is to dump the physical memory (from the host machine?) and see that the contents look like encrypted data?

Is there a plan to add something to the /sys/kernel/debug/tracing or the /proc/pagemaps to indicate which page is encrypted - it would be nice for the guest kernel to be able to verify that the pages it thinks are encrypted actually are.

There are no plans that I'm aware of at the moment. Contributions are always welcome :)

Mengyuan-L commented 9 months ago

Our group has a design that allows DPDK to run on SEV machines. Perhaps we can have a private discussion also.

tlendacky commented 9 months ago

@tlendacky Is there a way to reach out to you directly? I have some more details and questions i would like to discuss

It would be best to have the conversations on the mailing lists for all to benefit. The KVM mailing list and/or the linux-coco mailing list.

JETtech-Labs commented 9 months ago

Our group has a design that allows DPDK to run on SEV machines. Perhaps we can have a private discussion also.

@Mengyuan-L let me know the best way to contact you

Mengyuan-L commented 9 months ago

Our group has a design that allows DPDK to run on SEV machines. Perhaps we can have a private discussion also.

@Mengyuan-L let me know the best way to contact you

Please drop me an email to lmy AT mit DOT edu

JETtech-Labs commented 9 months ago

@Mengyuan-L

Our group has a design that allows DPDK to run on SEV machines. Perhaps we can have a private discussion also.

@Mengyuan-L let me know the best way to contact you

Please drop me an email to lmy AT mit DOT edu

@Mengyuan-L email sent - thanks

tlendacky commented 9 months ago

Our group has a design that allows DPDK to run on SEV machines. Perhaps we can have a private discussion also.

It would be great to have this contributed upstream :)

Mengyuan-L commented 9 months ago

Our group has a design that allows DPDK to run on SEV machines. Perhaps we can have a private discussion also.

It would be great to have this contributed upstream :)

Sure. We plan to open-source our project soon.