QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
538 stars 48 forks source link

PCI passthrough not working for HVM domains #1659

Closed esheltone closed 8 years ago

esheltone commented 8 years ago

There have been multiple reports that PCI passthrough does not work for HVM domains using the qubes software:

https://groups.google.com/d/msg/qubes-users/cmPRMOkxkdA/gIV68O0-CQAJ (reporting passthrough not working via libvirt, but that passthrough still could be done using Xen xl) https://groups.google.com/d/msg/qubes-users/ExMvykCyYiY/M3nHxweRFAAJ (confirmation by Marek that passthrough was not working on R3) https://groups.google.com/d/msg/qubes-users/ppKj_YWqr94/l2gHv6uJAgAJ

This issue appears to have started with use of the HAL in Qubes R3. PCI passthrough continues to work fine for PV-based Qubes VMs, such as sys-net.

Marek guessed that it could be a qemu issue (see second linked post). However, in the first linked post, PCI passthrough was done to an HVM domain via 'xl' using "device_model_version = 'qemu-xen-traditional'", so this may rule out qemu as the culprit.

marmarek commented 8 years ago

I guess it's a bug that libxl__device_pci_add_xenstore is called at all:

    if (!starting)
        rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
    else
        rc = 0;

But indeed, for stubdomain, starting=0:

    stubdomid = libxl_get_stubdom_id(ctx, domid);
    if (stubdomid != 0) {
        libxl_device_pci pcidev_s = *pcidev;
        /* stubdomain is always running by now, even at create time */
        rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
        if ( rc )
            goto out;
    }

But domain config isn't saved at that time. And even if it would be saved, it will fail anyway, because it's about stubdomain config, which isn't saved at all. Generally it looks like PCI passthrough handling with stubdomain is forgotten use case... Try this patch: https://gist.github.com/c8e080f0036fb21759976a6bf3f5c668 (haven't even compile tested it...)

WetwareLabs commented 8 years ago

Thanks for the patch! I tried it, and the VM can now start with multiple PCI devices assigned. However there's no video output and VNC can't be used.

I think next step should be to target VM's created with libvirt and get them working with PCI passthrough at least with non-GPU devices. I tried to assign the same USB controller that worked with xl, but the error message was with first start attempt: "libxenlight failed to create new domain 'win7'" and on next start attempt: "Requested operation not valid: PCI device 0000:00:1a.0 is in use by driver xenlight, domain 'win7' It seems there's more debugging to be done..

marmarek commented 8 years ago

Try restarting libvirtd. There seems to be a bug that libvirt doesn't release (internally) device after failed VM startup.

jaspertron commented 8 years ago

Reverting this patch fixes the problem

What's the proper way to do this with qubes-builder? I put this patch

--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2214,19 +2214,9 @@
     uint32_t add_mapping)
 {
     DECLARE_DOMCTL;
-    xc_dominfo_t info;
     int ret = 0, err;
     unsigned long done = 0, nr, max_batch_sz;

-    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ||
-         info.domid != domid )
-    {
-        PERROR("Could not get info for domain");
-        return -EINVAL;
-    }
-    if ( !xc_core_arch_auto_translated_physmap(&info) )
-        return 0;
-
     if ( !nr_mfns )
         return 0;

in qubes-builder/qubes-src/vmm-xen/patches.qubes/ (and added it to series.conf) and then did make vmm-xen-dom0 and reinstalled (some of) the rpms it built in dom0, but the test is still failing.

WetwareLabs commented 8 years ago

Jasper: That looks like right procedure. Are you sure you installed all the required RPMS? This is a script I use to automate Xen install:

!/bin/sh
qvm-run -p qubes-dev 'cd qubes-builder-3.1/qubes-packages-mirror-repo/fc20/rpm; tar c xen*.rpm' | tar xv
rpm -U --force --nodeps xen-4.* xen-libs-* xen-runtime-* xen-hvm-* xen-qemu-tools-*
rpm -U --force xen-hypervisor-*
WetwareLabs commented 8 years ago

With Marmarek's newest patch above, assigning PCI devices does not show any errors in libxl and VM creation continues, but VM does not detect these devices (I've now tested this on Win7, Win8 and Arch Linux guests created via VM Manager). On kernel side there are these messages:

127712.375328] xen_pciback: vpci: 0000:00:1a.0: assign to virtual slot 0
[127712.377265] pciback 0000:00:1a.0: registering for 102
[127712.421776] xen-pciback pci-101-0: 22 Couldn't locate PCI device (0000:00:1a.0)! perhaps already in-use?

Here the 102 is stubdom id and 101 is the DomU. I don't undertand this: Should libxl really try to assign the device into BOTH virtual machines? It seems assigning to stubdom vm succeeds, but to actual domU.

I've tried the following mods:

None of these modifications have any effect on the outcome and the device will not be detected by the domU. 'dmesg' on guest side does not reveal anything of suspect. 'xl pci-list' shows anyhow that the device should be attached:

Vdev Device
05.0 0000:00:1a.0

Marek, any idea how to proceed?

marmarek commented 8 years ago

Here the 102 is stubdom id and 101 is the DomU. I don't undertand this: Should libxl really try to assign the device into BOTH virtual machines? It seems assigning to stubdom vm succeeds, but to actual domU.

This is something I'm trying to figure out. But probably yes, and probably also the reason why it's failing...

Generally, with the original patch (revert), I can get the device visible in the VM (only trying using Linux). But not always - probably because of some race condition - I guess its about the above pciback having device assigned twice. But with the "actual fix" I haven't got it working - also probably because of that race condition - having xc_domain_getinfo still executed, I guess, make loosing the race more probable.

I'd expect that disabling assign to stubdom VM should fix Linux DomU and disabling assign to DomU should fix Windows DomU. And this is why it should probably be assigned to both... In any case, that patches are needed too.

Also apply the add xenstore domain flag to hypervisor patch to successfully compile it on Xen 4.6.1

I have modified fix to apply on Xen 4.6.1. It's simplified version which only add XEN_DOMCTL_getdomaininfo to the list in xsm_domctl, just before return xsm_default_action(XSM_DM_PRIV, current->domain, d);

jaspertron commented 8 years ago

Are you sure you installed all the required RPMS? This is a script I use to automate Xen install

Yes, I've installed all of those rpms. I rebuilt them and used your install script just to be sure, but the test still fails.

WetwareLabs commented 8 years ago

@jaspertron Try this patch (contains only extra logging, no actual passthrough modifications) try to find out where it fails (you can pastebin the logs if necessary). Use xl create (or pci-attach) with -vvv flag and don't forget to check the 'xl dmesg' as well. In case the problem is with shared RMRR, this patch also logs which devices the RMRR is shared with (then you need to PT these devices together or set pci_strictreset to false)

jaspertron commented 8 years ago

@WetwareLabs Here are the xl create logs (with your logging patch applied) from the successful test (using Qubes R2 stubdom) and failing test. They're very similar and I can't spot any interesting differences.

marmarek commented 8 years ago

In case the problem is with shared RMRR, this patch also logs which devices the RMRR is shared with

Do you know how to get this information at toolstack side, if possible at all?

@WetwareLabs Here are the xl create logs (with your logging patch applied) from the successful test (using Qubes R2 stubdom) and failing test. They're very similar and I can't spot any interesting differences.

Indeed looks the same. What about kernel messages, xl dmesg and stubdom output in those cases? Anything new there?

jaspertron commented 8 years ago

What about kernel messages

working:

[user@dom0 ~]$ sudo journalctl -kfqn0
Jul 12 21:02:50 dom0 kernel: xen-blkback: ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi) 
Jul 12 21:02:50 dom0 kernel: xen-blkback: ring-ref 2046, event-channel 5, protocol 1 (x86_64-abi) 
Jul 12 21:02:51 dom0 kernel: xen_pciback: vpci: 0000:01:00.1: assign to virtual slot 0
Jul 12 21:02:51 dom0 kernel: pciback 0000:01:00.1: registering for 22
Jul 12 21:02:54 dom0 kernel: pciback 0000:01:00.1: enabling device (0000 -> 0002)
Jul 12 21:02:54 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Jul 12 21:02:54 dom0 kernel: Already setup the GSI :17
Jul 12 21:03:03 dom0 kernel: xen-pciback pci-21-0: 22 Couldn't locate PCI device (0000:01:00.1)! perhaps already in-use?
Jul 12 21:03:07 dom0 kernel: xen-blkback: backend/vbd/22/51712: prepare for reconnect
Jul 12 21:03:07 dom0 kernel: xen-blkback: backend/vbd/22/51728: prepare for reconnect
Jul 12 21:03:12 dom0 kernel: xen-blkback: ring-ref 8, event-channel 65, protocol 1 (x86_64-abi) persistent grants
Jul 12 21:03:12 dom0 kernel: xen-blkback: ring-ref 9, event-channel 66, protocol 1 (x86_64-abi) persistent grants

failing:

[user@dom0 ~]$ sudo journalctl -kfqn0
Jul 12 21:04:35 dom0 kernel: xen-blkback: ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi) 
Jul 12 21:04:35 dom0 kernel: xen-blkback: ring-ref 2046, event-channel 5, protocol 1 (x86_64-abi) 
Jul 12 21:04:36 dom0 kernel: xen_pciback: vpci: 0000:01:00.1: assign to virtual slot 0
Jul 12 21:04:36 dom0 kernel: pciback 0000:01:00.1: registering for 24
Jul 12 21:04:36 dom0 kernel: xen-pciback pci-23-0: 22 Couldn't locate PCI device (0000:01:00.1)! perhaps already in-use?
Jul 12 21:04:40 dom0 kernel: xen-blkback: backend/vbd/24/51712: prepare for reconnect
Jul 12 21:04:40 dom0 kernel: xen-blkback: backend/vbd/24/51728: prepare for reconnect
Jul 12 21:04:45 dom0 kernel: xen-blkback: ring-ref 8, event-channel 65, protocol 1 (x86_64-abi) persistent grants
Jul 12 21:04:45 dom0 kernel: xen-blkback: ring-ref 9, event-channel 66, protocol 1 (x86_64-abi) persistent grants

xl dmesg

xl create doesn't cause xl dmesg to print anything new.

and stubdom output in those cases?

working stubdom output failing stubdom output

WetwareLabs commented 8 years ago

In case the problem is with shared RMRR, this patch also logs which devices the RMRR is >>shared with

Do you know how to get this information at toolstack side, if possible at all?

It seems the RMRR units are stored and processed only inside the passthrough driver in Xen. Additionally, by quick grepping there seems to be nothing related to DMAR or RMRR in amd-specific branch, so only this feature is implemented only for Intels VT-d ? Maybe the _acpi_rmrrunits table could be published via sysfs if we really wanted to, but what is it you're planning?

BTW. Did you make any progress with the race condition and device detection?

WetwareLabs commented 8 years ago

@jaspertron So how it fails, VM starts but device is not detected? Which motherboard and device you're using for PT?

My earlier comment on "success" referred to passthrough on qemu-xen (non-traditional) without stubdom (because I was erroneously assuming qemu-xen supported stubdom, which it at this time doesn't). I haven't managed to passthrough on traditional branch with stubdom yet, let me be clear about this :)

jaspertron commented 8 years ago

Which motherboard and device you're using for PT?

user@dom0 ~> sudo qubes-hcl-report | head -n8
Qubes release 3.1 (R3.1)

Brand:      MSI
Model:      MS-7821
BIOS:       V10.7

Xen:        4.6.0
Kernel:     4.1.13-9

I'm using my second graphics card for passthrough:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8

So how it fails, VM starts but device is not detected?

Yes, the VM starts but doesn't see the device when I passthrough only 1 device (I've been using 01:00.1 for testing, FWIW).

If I try to passthrough both (pci = ["01:00.0", "01:00.1"]), it doesn't even start:

---snip---
libxl: error: libxl_internal.c:499:libxl__get_domain_configuration: wetware error: json config empty
libxl: error: libxl_pci.c:185:libxl__device_pci_add_xenstore: wetware pci_add_xenstore get_domain_conf failed
libxl: error: libxl_pci.c:1198:libxl__device_pci_add: do_pci_add failed -16
libxl: error: libxl_create.c:1411:domcreate_attach_pci: libxl_device_pci_add failed: -16
---snip---
jaspertron commented 8 years ago

Oh, ignore that last part. I see you ran into the same thing with multiple devices. I didn't apply this patch marmarek provided, so I guess those errors are expected.

But I can't replicate marmarek's success even with 1 device.

@marmarek does the revert patch I'm using look like the one you're using?

marmarek commented 8 years ago

@marmarek does the revert patch I'm using look like the one you're using?

Yes. But I think there is still some race condition, because it sometimes work, sometimes doesn't. With this revert it works most of the time for me (with single device, haven't tried multiple of them), but with proper fix instead (which in theory should have exactly the same result), it mostly doesn't work...

marmarek commented 8 years ago

Maybe related: during upgrading to Xen 4.2 (between Qubes 2 and 3.0) xen-libxl-stubdom-pci-create.patch was not ported and marked as "TODO". It was never fixed.

marmarek commented 8 years ago

Automated announcement from builder-github

The package xen-4.6.3-21.fc23 has been pushed to the r3.2 testing repository for the Fedora fc23 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r3.2-current-testing

Changes included in this update

marmarek commented 8 years ago

Automated announcement from builder-github

The package xen-4.6.3-21.fc24 has been pushed to the r3.2 testing repository for the Fedora fc24 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r3.2-current-testing

Changes included in this update

marmarek commented 8 years ago

Automated announcement from builder-github

The package xen-4.6.3-21.fc23 has been pushed to the r3.2 testing repository for dom0. To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-21.fc23 has been pushed to the r3.2 stable repository for the Fedora fc23 template. To install this update, please use the standard update command:

sudo yum update

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-21.fc24 has been pushed to the r3.2 stable repository for the Fedora fc24 template. To install this update, please use the standard update command:

sudo yum update

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-21.fc23 has been pushed to the r3.2 stable repository for dom0. To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-22.fc21 has been pushed to the r3.1 testing repository for the Fedora fc21 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r3.1-current-testing

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-22.fc22 has been pushed to the r3.1 testing repository for the Fedora fc22 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r3.1-current-testing

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-22.fc23 has been pushed to the r3.1 testing repository for the Fedora fc23 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r3.1-current-testing

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-22.fc24 has been pushed to the r3.1 testing repository for the Fedora fc24 template. To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r3.1-current-testing

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-22.fc20 has been pushed to the r3.1 testing repository for dom0. To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen_4.6.3-24+deb8u1 has been pushed to the r3.2 testing repository for the Debian jessie template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing jessie-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen_4.6.3-24+deb9u1 has been pushed to the r3.2 testing repository for the Debian stretch template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-24.fc20 has been pushed to the r3.1 stable repository for dom0. To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-24.fc21 has been pushed to the r3.1 stable repository for the Fedora fc21 template. To install this update, please use the standard update command:

sudo yum update

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen_2001:4.6.3-24+deb8u1 has been pushed to the r3.2 stable repository for the Debian jessie template. To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-24.fc22 has been pushed to the r3.1 stable repository for the Fedora fc22 template. To install this update, please use the standard update command:

sudo yum update

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen-4.6.3-24.fc23 has been pushed to the r3.1 stable repository for the Fedora fc23 template. To install this update, please use the standard update command:

sudo yum update

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen_4.6.3-24+deb8u1 has been pushed to the r3.1 testing repository for the Debian jessie template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing jessie-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen_4.6.3-24+deb9u1 has been pushed to the r3.1 testing repository for the Debian stretch template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

marmarek commented 7 years ago

Automated announcement from builder-github

The package xen_4.6.3-24+deb7u1 has been pushed to the r3.1 testing repository for the Debian wheezy template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing wheezy-testing, then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot commented 7 years ago

Automated announcement from builder-github

The package xen_2001:4.6.3-24+deb9u1 has been pushed to the r3.2 stable repository for the Debian stretch template. To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update