Closed esheltone closed 8 years ago
I guess it's a bug that libxl__device_pci_add_xenstore
is called at all:
if (!starting)
rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
else
rc = 0;
But indeed, for stubdomain, starting=0
:
stubdomid = libxl_get_stubdom_id(ctx, domid);
if (stubdomid != 0) {
libxl_device_pci pcidev_s = *pcidev;
/* stubdomain is always running by now, even at create time */
rc = do_pci_add(gc, stubdomid, &pcidev_s, 0);
if ( rc )
goto out;
}
But domain config isn't saved at that time. And even if it would be saved, it will fail anyway, because it's about stubdomain config, which isn't saved at all. Generally it looks like PCI passthrough handling with stubdomain is forgotten use case... Try this patch: https://gist.github.com/c8e080f0036fb21759976a6bf3f5c668 (haven't even compile tested it...)
Thanks for the patch! I tried it, and the VM can now start with multiple PCI devices assigned. However there's no video output and VNC can't be used.
I think next step should be to target VM's created with libvirt and get them working with PCI passthrough at least with non-GPU devices. I tried to assign the same USB controller that worked with xl, but the error message was with first start attempt:
"libxenlight failed to create new domain 'win7'"
and on next start attempt:
"Requested operation not valid: PCI device 0000:00:1a.0 is in use by driver xenlight, domain 'win7'
It seems there's more debugging to be done..
Try restarting libvirtd
. There seems to be a bug that libvirt doesn't release (internally) device after failed VM startup.
Reverting this patch fixes the problem
What's the proper way to do this with qubes-builder? I put this patch
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2214,19 +2214,9 @@
uint32_t add_mapping)
{
DECLARE_DOMCTL;
- xc_dominfo_t info;
int ret = 0, err;
unsigned long done = 0, nr, max_batch_sz;
- if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ||
- info.domid != domid )
- {
- PERROR("Could not get info for domain");
- return -EINVAL;
- }
- if ( !xc_core_arch_auto_translated_physmap(&info) )
- return 0;
-
if ( !nr_mfns )
return 0;
in qubes-builder/qubes-src/vmm-xen/patches.qubes/ (and added it to series.conf) and then did make vmm-xen-dom0
and reinstalled (some of) the rpms it built in dom0, but the test is still failing.
Jasper: That looks like right procedure. Are you sure you installed all the required RPMS? This is a script I use to automate Xen install:
!/bin/sh
qvm-run -p qubes-dev 'cd qubes-builder-3.1/qubes-packages-mirror-repo/fc20/rpm; tar c xen*.rpm' | tar xv
rpm -U --force --nodeps xen-4.* xen-libs-* xen-runtime-* xen-hvm-* xen-qemu-tools-*
rpm -U --force xen-hypervisor-*
With Marmarek's newest patch above, assigning PCI devices does not show any errors in libxl and VM creation continues, but VM does not detect these devices (I've now tested this on Win7, Win8 and Arch Linux guests created via VM Manager). On kernel side there are these messages:
127712.375328] xen_pciback: vpci: 0000:00:1a.0: assign to virtual slot 0
[127712.377265] pciback 0000:00:1a.0: registering for 102
[127712.421776] xen-pciback pci-101-0: 22 Couldn't locate PCI device (0000:00:1a.0)! perhaps already in-use?
Here the 102 is stubdom id and 101 is the DomU. I don't undertand this: Should libxl really try to assign the device into BOTH virtual machines? It seems assigning to stubdom vm succeeds, but to actual domU.
I've tried the following mods:
None of these modifications have any effect on the outcome and the device will not be detected by the domU. 'dmesg' on guest side does not reveal anything of suspect. 'xl pci-list' shows anyhow that the device should be attached:
Vdev Device
05.0 0000:00:1a.0
Marek, any idea how to proceed?
Here the 102 is stubdom id and 101 is the DomU. I don't undertand this: Should libxl really try to assign the device into BOTH virtual machines? It seems assigning to stubdom vm succeeds, but to actual domU.
This is something I'm trying to figure out. But probably yes, and probably also the reason why it's failing...
Generally, with the original patch (revert), I can get the device visible in the VM (only trying using Linux). But not always - probably because of some race condition - I guess its about the above pciback having device assigned twice.
But with the "actual fix" I haven't got it working - also probably because of that race condition - having xc_domain_getinfo
still executed, I guess, make loosing the race more probable.
I'd expect that disabling assign to stubdom VM should fix Linux DomU and disabling assign to DomU should fix Windows DomU. And this is why it should probably be assigned to both... In any case, that patches are needed too.
Also apply the add xenstore domain flag to hypervisor patch to successfully compile it on Xen 4.6.1
I have modified fix to apply on Xen 4.6.1. It's simplified version which only add XEN_DOMCTL_getdomaininfo
to the list in xsm_domctl
, just before return xsm_default_action(XSM_DM_PRIV, current->domain, d);
Are you sure you installed all the required RPMS? This is a script I use to automate Xen install
Yes, I've installed all of those rpms. I rebuilt them and used your install script just to be sure, but the test still fails.
@jaspertron Try this patch (contains only extra logging, no actual passthrough modifications) try to find out where it fails (you can pastebin the logs if necessary). Use xl create (or pci-attach) with -vvv flag and don't forget to check the 'xl dmesg' as well. In case the problem is with shared RMRR, this patch also logs which devices the RMRR is shared with (then you need to PT these devices together or set pci_strictreset to false)
@WetwareLabs Here are the xl create
logs (with your logging patch applied) from the successful test (using Qubes R2 stubdom) and failing test. They're very similar and I can't spot any interesting differences.
In case the problem is with shared RMRR, this patch also logs which devices the RMRR is shared with
Do you know how to get this information at toolstack side, if possible at all?
@WetwareLabs Here are the xl create logs (with your logging patch applied) from the successful test (using Qubes R2 stubdom) and failing test. They're very similar and I can't spot any interesting differences.
Indeed looks the same. What about kernel messages, xl dmesg
and stubdom output in those cases? Anything new there?
What about kernel messages
working:
[user@dom0 ~]$ sudo journalctl -kfqn0
Jul 12 21:02:50 dom0 kernel: xen-blkback: ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi)
Jul 12 21:02:50 dom0 kernel: xen-blkback: ring-ref 2046, event-channel 5, protocol 1 (x86_64-abi)
Jul 12 21:02:51 dom0 kernel: xen_pciback: vpci: 0000:01:00.1: assign to virtual slot 0
Jul 12 21:02:51 dom0 kernel: pciback 0000:01:00.1: registering for 22
Jul 12 21:02:54 dom0 kernel: pciback 0000:01:00.1: enabling device (0000 -> 0002)
Jul 12 21:02:54 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Jul 12 21:02:54 dom0 kernel: Already setup the GSI :17
Jul 12 21:03:03 dom0 kernel: xen-pciback pci-21-0: 22 Couldn't locate PCI device (0000:01:00.1)! perhaps already in-use?
Jul 12 21:03:07 dom0 kernel: xen-blkback: backend/vbd/22/51712: prepare for reconnect
Jul 12 21:03:07 dom0 kernel: xen-blkback: backend/vbd/22/51728: prepare for reconnect
Jul 12 21:03:12 dom0 kernel: xen-blkback: ring-ref 8, event-channel 65, protocol 1 (x86_64-abi) persistent grants
Jul 12 21:03:12 dom0 kernel: xen-blkback: ring-ref 9, event-channel 66, protocol 1 (x86_64-abi) persistent grants
failing:
[user@dom0 ~]$ sudo journalctl -kfqn0
Jul 12 21:04:35 dom0 kernel: xen-blkback: ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi)
Jul 12 21:04:35 dom0 kernel: xen-blkback: ring-ref 2046, event-channel 5, protocol 1 (x86_64-abi)
Jul 12 21:04:36 dom0 kernel: xen_pciback: vpci: 0000:01:00.1: assign to virtual slot 0
Jul 12 21:04:36 dom0 kernel: pciback 0000:01:00.1: registering for 24
Jul 12 21:04:36 dom0 kernel: xen-pciback pci-23-0: 22 Couldn't locate PCI device (0000:01:00.1)! perhaps already in-use?
Jul 12 21:04:40 dom0 kernel: xen-blkback: backend/vbd/24/51712: prepare for reconnect
Jul 12 21:04:40 dom0 kernel: xen-blkback: backend/vbd/24/51728: prepare for reconnect
Jul 12 21:04:45 dom0 kernel: xen-blkback: ring-ref 8, event-channel 65, protocol 1 (x86_64-abi) persistent grants
Jul 12 21:04:45 dom0 kernel: xen-blkback: ring-ref 9, event-channel 66, protocol 1 (x86_64-abi) persistent grants
xl dmesg
xl create
doesn't cause xl dmesg
to print anything new.
and stubdom output in those cases?
In case the problem is with shared RMRR, this patch also logs which devices the RMRR is >>shared with
Do you know how to get this information at toolstack side, if possible at all?
It seems the RMRR units are stored and processed only inside the passthrough driver in Xen. Additionally, by quick grepping there seems to be nothing related to DMAR or RMRR in amd-specific branch, so only this feature is implemented only for Intels VT-d ? Maybe the _acpi_rmrrunits table could be published via sysfs if we really wanted to, but what is it you're planning?
BTW. Did you make any progress with the race condition and device detection?
@jaspertron So how it fails, VM starts but device is not detected? Which motherboard and device you're using for PT?
My earlier comment on "success" referred to passthrough on qemu-xen (non-traditional) without stubdom (because I was erroneously assuming qemu-xen supported stubdom, which it at this time doesn't). I haven't managed to passthrough on traditional branch with stubdom yet, let me be clear about this :)
Which motherboard and device you're using for PT?
user@dom0 ~> sudo qubes-hcl-report | head -n8
Qubes release 3.1 (R3.1)
Brand: MSI
Model: MS-7821
BIOS: V10.7
Xen: 4.6.0
Kernel: 4.1.13-9
I'm using my second graphics card for passthrough:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
So how it fails, VM starts but device is not detected?
Yes, the VM starts but doesn't see the device when I passthrough only 1 device (I've been using 01:00.1 for testing, FWIW).
If I try to passthrough both (pci = ["01:00.0", "01:00.1"]
), it doesn't even start:
---snip---
libxl: error: libxl_internal.c:499:libxl__get_domain_configuration: wetware error: json config empty
libxl: error: libxl_pci.c:185:libxl__device_pci_add_xenstore: wetware pci_add_xenstore get_domain_conf failed
libxl: error: libxl_pci.c:1198:libxl__device_pci_add: do_pci_add failed -16
libxl: error: libxl_create.c:1411:domcreate_attach_pci: libxl_device_pci_add failed: -16
---snip---
Oh, ignore that last part. I see you ran into the same thing with multiple devices. I didn't apply this patch marmarek provided, so I guess those errors are expected.
But I can't replicate marmarek's success even with 1 device.
@marmarek does the revert patch I'm using look like the one you're using?
@marmarek does the revert patch I'm using look like the one you're using?
Yes. But I think there is still some race condition, because it sometimes work, sometimes doesn't. With this revert it works most of the time for me (with single device, haven't tried multiple of them), but with proper fix instead (which in theory should have exactly the same result), it mostly doesn't work...
Maybe related: during upgrading to Xen 4.2 (between Qubes 2 and 3.0) xen-libxl-stubdom-pci-create.patch was not ported and marked as "TODO". It was never fixed.
Automated announcement from builder-github
The package xen-4.6.3-21.fc23
has been pushed to the r3.2
testing repository for the Fedora fc23
template.
To test this update, please install it with the following command:
sudo yum update --enablerepo=qubes-vm-r3.2-current-testing
Automated announcement from builder-github
The package xen-4.6.3-21.fc24
has been pushed to the r3.2
testing repository for the Fedora fc24
template.
To test this update, please install it with the following command:
sudo yum update --enablerepo=qubes-vm-r3.2-current-testing
Automated announcement from builder-github
The package xen-4.6.3-21.fc23
has been pushed to the r3.2
testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
Automated announcement from builder-github
The package xen-4.6.3-21.fc23
has been pushed to the r3.2
stable repository for the Fedora fc23
template.
To install this update, please use the standard update command:
sudo yum update
Automated announcement from builder-github
The package xen-4.6.3-21.fc24
has been pushed to the r3.2
stable repository for the Fedora fc24
template.
To install this update, please use the standard update command:
sudo yum update
Automated announcement from builder-github
The package xen-4.6.3-21.fc23
has been pushed to the r3.2
stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
Automated announcement from builder-github
The package xen-4.6.3-22.fc21
has been pushed to the r3.1
testing repository for the Fedora fc21
template.
To test this update, please install it with the following command:
sudo yum update --enablerepo=qubes-vm-r3.1-current-testing
Automated announcement from builder-github
The package xen-4.6.3-22.fc22
has been pushed to the r3.1
testing repository for the Fedora fc22
template.
To test this update, please install it with the following command:
sudo yum update --enablerepo=qubes-vm-r3.1-current-testing
Automated announcement from builder-github
The package xen-4.6.3-22.fc23
has been pushed to the r3.1
testing repository for the Fedora fc23
template.
To test this update, please install it with the following command:
sudo yum update --enablerepo=qubes-vm-r3.1-current-testing
Automated announcement from builder-github
The package xen-4.6.3-22.fc24
has been pushed to the r3.1
testing repository for the Fedora fc24
template.
To test this update, please install it with the following command:
sudo yum update --enablerepo=qubes-vm-r3.1-current-testing
Automated announcement from builder-github
The package xen-4.6.3-22.fc20
has been pushed to the r3.1
testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
Automated announcement from builder-github
The package xen_4.6.3-24+deb8u1
has been pushed to the r3.2
testing repository for the Debian jessie
template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list
by uncommenting the line containing jessie-testing
, then use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package xen_4.6.3-24+deb9u1
has been pushed to the r3.2
testing repository for the Debian stretch
template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list
by uncommenting the line containing stretch-testing
, then use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package xen-4.6.3-24.fc20
has been pushed to the r3.1
stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
Automated announcement from builder-github
The package xen-4.6.3-24.fc21
has been pushed to the r3.1
stable repository for the Fedora fc21
template.
To install this update, please use the standard update command:
sudo yum update
Automated announcement from builder-github
The package xen_2001:4.6.3-24+deb8u1
has been pushed to the r3.2
stable repository for the Debian jessie
template.
To install this update, please use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package xen-4.6.3-24.fc22
has been pushed to the r3.1
stable repository for the Fedora fc22
template.
To install this update, please use the standard update command:
sudo yum update
Automated announcement from builder-github
The package xen-4.6.3-24.fc23
has been pushed to the r3.1
stable repository for the Fedora fc23
template.
To install this update, please use the standard update command:
sudo yum update
Automated announcement from builder-github
The package xen_4.6.3-24+deb8u1
has been pushed to the r3.1
testing repository for the Debian jessie
template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list
by uncommenting the line containing jessie-testing
, then use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package xen_4.6.3-24+deb9u1
has been pushed to the r3.1
testing repository for the Debian stretch
template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list
by uncommenting the line containing stretch-testing
, then use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package xen_4.6.3-24+deb7u1
has been pushed to the r3.1
testing repository for the Debian wheezy
template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list
by uncommenting the line containing wheezy-testing
, then use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package xen_2001:4.6.3-24+deb9u1
has been pushed to the r3.2
stable repository for the Debian stretch
template.
To install this update, please use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
There have been multiple reports that PCI passthrough does not work for HVM domains using the qubes software:
https://groups.google.com/d/msg/qubes-users/cmPRMOkxkdA/gIV68O0-CQAJ (reporting passthrough not working via libvirt, but that passthrough still could be done using Xen xl) https://groups.google.com/d/msg/qubes-users/ExMvykCyYiY/M3nHxweRFAAJ (confirmation by Marek that passthrough was not working on R3) https://groups.google.com/d/msg/qubes-users/ppKj_YWqr94/l2gHv6uJAgAJ
This issue appears to have started with use of the HAL in Qubes R3. PCI passthrough continues to work fine for PV-based Qubes VMs, such as sys-net.
Marek guessed that it could be a qemu issue (see second linked post). However, in the first linked post, PCI passthrough was done to an HVM domain via 'xl' using "device_model_version = 'qemu-xen-traditional'", so this may rule out qemu as the culprit.