QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
532 stars 46 forks source link

PCI passthrough multiple devices to HVM StandaloneVM only passes through a single device #4724

Closed qubes35725 closed 1 year ago

qubes35725 commented 5 years ago

Qubes OS version:

Qubes release 4.0 (R4.0) (actually 4.0.1)

Affected component(s):

HVM Standalone VM, not based on template.


Steps to reproduce the behavior:

Expected behavior:

Three devices passed through to guest VM

Actual behavior:

Only device 02:00.1 is passed through to guest VM

General notes:

I am attempting to create a HVM StandaloneVM with three PCI devices passed through to the VM: a USB controller 00:1d.0, and a VGA card and its accompanying audio device (02:00.0, 02:00.1). The setup works successfully when started via a configuration file (i.e. xl create htpc.hvm). However, when trying to integrate the guest into the Qubes framework using a Qubes StandaloneVM only the audio device 02:00.1 shows up in the guest.

In /var/log/libvirt/libxl/htpc.log we have the section

    "pcidevs": [
        {
            "func": 1,
            "bus": 2,
            "vdevfn": 40,
            "rdm_policy": "relaxed"
        },
        {
            "dev": 29,
            "vdevfn": 48,
            "rdm_policy": "relaxed"
        },
        {
            "bus": 2,
            "vdevfn": 56,
            "rdm_policy": "relaxed"
        }
    ],

showing that all three PCI devices are being written to the config passed to libxl (though it seems in arbitrary order - they are listed in the order 00:1d.0, 02:00.0, 02:00.1, both in the GUI and in qvm-pci). However when the VM is started, the log in /var/log/xen/console/guest-htpc-dm.log shows only a single PCIFRONT section for the first device in the list, 02:00.1.

******************* PCIFRONT for device/pci/0 **********

backend at /local/domain/0/backend/pci/10/0
**************************
pcifront_watches: waiting for backend events /local/domain/0/backend/pci/10/0/state
xs_read_watch() -> device-model/9/command dm-command
dm-command: hot insert pass-through pci dev 
warning: pt_iomul not supported in stubdom 02:00.1
xs_read_watch() -> device-model/9/command dm-command
xs_read(device-model/9/command): ENOENT
pcifront_watches: backend state changed: /local/domain/0/backend/pci/10/0/state 7
pcifront_watches: writing device/pci/0/state 7
pcifront_watches: backend state changed: /local/domain/0/backend/pci/10/0/state 8
pcifront_watches: writing device/pci/0/state 4
pcifront_watches: changing state to 4
pcifront_watches: backend state changed: /local/domain/0/backend/pci/10/0/state 4
xs_read_watch() -> device-model/9/command dm-command
dm-command: hot insert pass-through pci dev 
xs_read_watch() -> device-model/9/command dm-command
xs_read(device-model/9/command): ENOENT
vga s->lfb_addr = f0000000 s->lfb_end = f1000000 
xs_daemon_open -> 11, 0x1701e8
qubes_gui/init[717]: got xorg conf, creating window
qubes_gui/init: 724

Testing PCI passthrough of each device individually was successful. Each of the three devices work as expected in the guest VM, the only problem remaining is that the three devices cannot be passed through together.

Also note this appears to be unique to StandaloneVM. My template based sys-net VM has two devices passed through, an ethernet controller and a wireless adapter, both of which are working properly.


Related issues:

Possibly #1659, however my devices appear to be functioning correctly when passed through individually.

marmarek commented 5 years ago

PCIFRONT for device/pci/0

The fact that you see only one device/pci/0 is expected - one pcifront device is responsible for the whole PCI bus and can host multiple devices. Indeed below you can see multiple devices inserted:

dm-command: hot insert pass-through pci dev

(...)

dm-command: hot insert pass-through pci dev

I guess there is a third one somewhere below. Look also into /var/log/libvirt/libxl/libxl-driver.log in dom0 for related messages.

The setup works successfully when started via a configuration file (i.e. xl create htpc.hvm).

In this setup you probably don't use qemu sandboxing in stubdomain, and run qemu directly in dom0. Which is much less secure, but at the same time more reliable (more tested configuration by Xen Project).

Additionally, the above log suggests you use old qemu-traditonal minios stubdomain. Try the new one by setting linux-stubdom feature to 1 (or removing it, as 1 is the default value:

qvm-features VMNAME linux-stubdom 1
(or)
qvm-features -D VMNAME linux-stubdom
qubes35725 commented 5 years ago

The setup works successfully when started via a configuration file (i.e. xl create htpc.hvm).

In this setup you probably don't use qemu sandboxing in stubdomain, and run qemu directly in dom0. Which is much less secure, but at the same time more reliable (more tested configuration by Xen Project).

Yes the existing config uses qemu-traditional driver model, running in dom0. The config file came across from my old Qubes 3.2 setup to 4.0.1. It still works fine, but since I am tinkering with the new system at the moment I hope to finally integrate the guest properly into Qubes, instead of manually launch from command line ;)

Additionally, the above log suggests you use old qemu-traditonal minios stubdomain. Try the new one by setting linux-stubdom feature to 1 (or removing it, as 1 is the default value:

qvm-features VMNAME linux-stubdom 1
(or)
qvm-features -D VMNAME linux-stubdom

This solved the multiple passthrough issue - I now have devices 00:1d.0 (USB controller) and 02:00.1 (audio) passed through successfully.

The VGA adapter 02:00.0 causes the guest to fail to boot with error in the stubdomain log:

pci 0000:00:00.0: can't claim BAR 6 [mem 0x000c0000-0x000dffff pref]: address conflict with Reserved [mem 0x000a0000-0x000fffff]
pcifront pci-0: Could not claim resource 0000:00:00.0/6! Device offline. Try using e820_host=1 in the guest config.

Seems this could be related to #2019. Is there a way to enable e820_host on my end?

marmarek commented 5 years ago

Is there a way to enable e820_host on my end?

It should be enabled by default, but there is a way to disable it (which you might done in the past) - check pci-e820-host feature - similar to linux-stubdom, by default is enabled.

qubes35725 commented 5 years ago

Is there a way to enable e820_host on my end?

It should be enabled by default, but there is a way to disable it (which you might done in the past) - check pci-e820-host feature - similar to linux-stubdom, by default is enabled.

It's odd that it would complain about the e820_host setting, because I did not change it from the default. Just in case, I set the feature pci-e820-host to "1", though the result was the same - I still get messages to enable e820_host in the stubom console. I then tried creating a fresh VM with unmodified default features and passed my list of devices through to the guest. Again it can't claim BAR 6 and recommends enabling e820_host.

I do get several messages in the stubdom starting with 'e820:' when the stubdom first boots

Linux version 4.14.68-xen-stubdom (user@build-fedora4) (gcc version 6.4.1 20170727 (Red Hat 6.4.1-1) (GCC)) #1 Tue Oct 2
 03:34:17 UTC 2018
Command line: debug console=hvc0
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Released 0 page(s)
e820: BIOS-provided physical RAM map:
Xen: [mem 0x0000000000000000-0x000000000009ffff] usable
Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved
Xen: [mem 0x0000000000100000-0x0000000008ffffff] usable
NX (Execute Disable) protection: active
Hypervisor detected: Xen PV
tsc: Fast TSC calibration failed
tsc: Unable to calibrate against PIT
tsc: No reference (HPET/PMTIMER) available
e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
e820: remove [mem 0x000a0000-0x000fffff] usable
e820: last_pfn = 0x9000 max_arch_pfn = 0x400000000
x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
Base memory trampoline at [ffff88000009a000] 9a000 size 24576
BRK [0x01b47000, 0x01b47fff] PGTABLE
RAMDISK: [mem 0x01c00000-0x033b1fff]
Zone ranges:
  DMA32    [mem 0x0000000000001000-0x0000000008ffffff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000001000-0x000000000009ffff]
  node   0: [mem 0x0000000000100000-0x0000000008ffffff]
Initmem setup node 0 [mem 0x0000000000001000-0x0000000008ffffff]
On node 0 totalpages: 36767
  DMA32 zone: 576 pages used for memmap
  DMA32 zone: 21 pages reserved
  DMA32 zone: 36767 pages, LIFO batch:7
p2m virtual area at ffffc90000000000, size is 200000
Remapped 0 page(s)
e820: [mem 0x09000000-0xffffffff] available for PCI devices

Does this mean e820_host is enabled in the stubdom? Perhaps the message to enable it is spurious (maybe pcifront does not check it is actually enabled before recommending to enable it)?

Most notable is whenever the VGA device 02:00.0 is passed through to the guest, the display window remains completely black. This differs from launching the guest from plain hvm config file, where I can see the whole boot process on the emulated VGA with or without the hardware VGA device attached. I configured a serial debug console and found that the guest is actually still booting... but eventually crashes:

[   32.505164] BUG: unable to handle kernel paging request at ffffd2cdbffff070
[   32.505239] IP: [<ffffffff8a9bbddc>] copy_page_range+0x2dc/0xbb0
[   32.505308] PGD 0 [   32.505329] 
[   32.505353] Oops: 0000 [#1] SMP
[   32.505386] Modules linked in: ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid dm_mod ata_generic crc32c_intel aesni_intel aes_x86_64 xen_netfront xen_blkfront ehci_pci ehci_hcd psmouse ata_piix glue_helper lrw gf128mul i2c_piix4 ablk_helper cryptd usbcore libata usb_common scsi_mod floppy
[   32.505791] CPU: 3 PID: 1 Comm: init Not tainted 4.9.0-8-amd64 #1 Debian 4.9.130-2
[   32.505859] Hardware name: Xen HVM domU, BIOS 4.8.4 11/28/2018
[   32.505911] task: ffff92ced9459040 task.stack: ffffba4340008000
[   32.505964] RIP: 0010:[<ffffffff8a9bbddc>]  [<ffffffff8a9bbddc>] copy_page_range+0x2dc/0xbb0
[   32.506040] RSP: 0018:ffffba434000bcb0  EFLAGS: 00010246
[   32.506084] RAX: ffffd2cdbffff070 RBX: ffff92ced4809580 RCX: ffffc0003fffffff
[   32.506149] RDX: ffffc00000000fff RSI: 00005603a1b2a000 RDI: 8000000114d5e067
[   32.506212] RBP: ffffba434000bea0 R08: ffff92ced8e22628 R09: ffff92ced4ca9838
[   32.506276] R10: 00000000000000a0 R11: ffff92ced9593540 R12: ffff92ced4fb0258
[   32.506338] R13: ffff92ced4809580 R14: ffff92ced8e22630 R15: 00005603a1b27000
[   32.506402] FS:  00007f2dccc71700(0000) GS:ffff92ceded80000(0000) knlGS:0000000000000000
[   32.506467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.506519] CR2: ffffd2cdbffff070 CR3: 0000000114d5a000 CR4: 0000000000160670
[   32.506585] Stack:
[   32.506607]  0000000000000001 ffff92ced4809580 ffff92ced4ca9bb8 ffffffff8a98c9c6
[   32.506699]  0000000000000246 ffff92cedeffccc0 0000000000000000 ffffba434000bd84
[   32.506789]  0000000200000000 0000000000000000 513a8f79b5659061 00000000027080c0
[   32.506878] Call Trace:
[   32.506908]  [<ffffffff8a98c9c6>] ? __alloc_pages_nodemask+0xf6/0x260
[   32.506964]  [<ffffffff8ab3a9d7>] ? __rb_insert_augmented+0x187/0x210
[   32.507020]  [<ffffffff8a9c6505>] ? anon_vma_chain_link+0x25/0x40
[   32.507078]  [<ffffffff8a8787e8>] ? copy_process.part.34+0xd58/0x1b50
[   32.507136]  [<ffffffff8a8797c3>] ? _do_fork+0xe3/0x3f0
[   32.507184]  [<ffffffff8a8797c3>] ? _do_fork+0xe3/0x3f0
[   32.507231]  [<ffffffff8aa0af14>] ? vfs_write+0x154/0x190
[   32.507279]  [<ffffffff8aa0c301>] ? SyS_write+0xa1/0xc0
[   32.507329]  [<ffffffff8a803b7d>] ? do_syscall_64+0x8d/0xf0
[   32.507381]  [<ffffffff8ae18f8e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[   32.507437] Code: ff ff ff 3f 00 c0 ff ff 48 ba ff 0f 00 00 00 c0 ff ff 48 0f 45 d1 48 83 e0 98 48 85 d0 0f 85 75 07 00 00 48 8b 84 24 90 00 00 00 <48> 8b 38 48 f7 c7 9f ff ff ff 0f 84 74 07 00 00 4c 89 fa 48 c1 
[   32.508019] RIP  [<ffffffff8a9bbddc>] copy_page_range+0x2dc/0xbb0
[   32.508019]  RSP <ffffba434000bcb0>
[   32.508019] CR2: ffffd2cdbffff070
[   32.508019] ---[ end trace 2c784e7d891a5f4a ]---
marmarek commented 5 years ago

Does this mean e820_host is enabled in the stubdom?

Looks like it is not. Probably it is enabled only for the target domain, not stubdomain. To be honest, it shouldn't be needed at all, as technically stubdomain doesn't need to map BARs of target domain's devices. It only needs access to PCI config space. But since the stubdomain use Linux, Linux try to do a complete PCI device setup. Anyway, looking at the kernel code, it doesn't fail the device setup if can't map BAR. Do you see any errors later related to this? Specifically when it's added to qemu.

I configured a serial debug console and found that the guest is actually still booting... but eventually crashes:

I don't see anything PCI passthrough related in this crash message, it may be totally unrelated kernel bug. I'd recommend trying different kernel version (for example from backports).

qubes35725 commented 5 years ago

looking at the kernel code, it doesn't fail the device setup if can't map BAR. Do you see any errors later related to this? Specifically when it's added to qemu.

No, I don't believe so. The error "can't claim BAR 6" is repeated three times in the log, once while adding each device, but I think that is because the PCI bus is being rescanned each time a device is added. For each device we get "device registered successfully". For the VGA device it is:

{"execute": "device_add", "arguments": {"driver": "xen-pci-passthrough", "id": "xen-pci-pt_0000-02-00.0", "hostaddr": "0000:00:00.00", "machine_addr": "0000:02:00.0", "permissive": true}}
[00:08.0] xen_pt_realize: Assigning real physical device 00:00.0 to devfn 0x40
[00:08.0] xen_pt_register_regions: IO region 0 registered (size=0x10000000 base_addr=0xd0000000 type: 0x4)
[00:08.0] xen_pt_register_regions: IO region 2 registered (size=0x00020000 base_addr=0xfbd20000 type: 0x4)
[00:08.0] xen_pt_register_regions: IO region 4 registered (size=0x00000100 base_addr=0x0000d000 type: 0x1)
[00:08.0] xen_pt_register_regions: Expansion ROM registered (size=0x00020000 base_addr=0x000c0000)
[00:08.0] xen_pt_config_reg_init: Offset 0x0004 mismatch! Emulated=0x0000, host=0x0107, syncing to 0x0004.
[00:08.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x0000, host=0xd000000c, syncing to 0xd000000c.
[00:08.0] xen_pt_config_reg_init: Offset 0x0018 mismatch! Emulated=0x0000, host=0xfbd20004, syncing to 0xfbd20004.
[00:08.0] xen_pt_config_reg_init: Offset 0x0020 mismatch! Emulated=0x0000, host=0xd001, syncing to 0xd001.
[00:08.0] xen_pt_config_reg_init: Offset 0x0030 mismatch! Emulated=0x0000, host=0xc0002, syncing to 0x0002.
[00:08.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x0000, host=0x0603, syncing to 0x0603.
[00:08.0] xen_pt_config_reg_init: Offset 0x00a2 mismatch! Emulated=0x0000, host=0x0080, syncing to 0x0080.
[00:08.0] xen_pt_config_reg_init: Offset 0x005c mismatch! Emulated=0x0000, host=0x8fa1, syncing to 0x8fa1.
[00:08.0] xen_pt_config_reg_init: Offset 0x006a mismatch! Emulated=0x0000, host=0x1101, syncing to 0x1101.
[00:08.0] xen_pt_pci_intx: intx=1
[00:08.0] xen_pt_realize: Real physical device 00:00.0 registered successfully

I configured a serial debug console and found that the guest is actually still booting... but eventually crashes:

I don't see anything PCI passthrough related in this crash message, it may be totally unrelated kernel bug. I'd recommend trying different kernel version (for example from backports).

Ok I will give that a try, and perhaps a fresh install too. Regarding the black-screen problem, I wonder how does qubes_gui choose the device to display when there are multiple VGA devices in the system? I see from the logs it receives xorg conf after PCI devices are probed. Maybe the screen is black because it is showing hardware VGA not emulated VGA?

marmarek commented 5 years ago

Maybe the screen is black because it is showing hardware VGA not emulated VGA?

It's not about qubes_gui, but rather the system inside. The qubes_gui in stubdomain show whatever is sent to emulated VGA, but the system inside indeed may choose something else as "primary VGA". Since you also have serial console to that VM, it shouldn't prevent further debugging. And when you'll get the VM started, you can login there and investigate. This may be also about hvmloader and/or SeaBIOS, but I'm not sure if there is any configuration option to choose VGA there. If you add guest_lvl=all to Xen command line, you should see hvmloader and SeaBIOS logs in xl dmesg (or /var/log/xen/console/hypervisor.log). Example output I get: https://gist.github.com/marmarek/e9da442ff3120cbd53f822a144cfdb04 As a side note, I would be surprised if hardware VGA would work as "primary VGA" in this setup. At least it require providing option rom, which QEMU cannot easily extract. There may be also other problems.

github-actions[bot] commented 1 year ago

This issue is being closed because:

If anyone believes that this issue should be reopened and reassigned to an active milestone, please leave a brief comment. (For example, if a bug still affects Qubes OS 4.1, then the comment "Affects 4.1" will suffice.)