Linux PV domain crashes on Xen 4.14

marmarek commented 4 years ago

Qubes OS version Qubes 4.1

Affected component(s) or functionality Xen, Linux

Brief summary Starting a Linux PV domain results in a kernel panic. The same Linux version worked on Xen 4.13.

To Reproduce Steps to reproduce the behavior: 1. Install Qubes 4.1 2. Update to Xen 4.14 3. Try to start a Linux PV domain 4. Observe failed start and the crash message in /var/log/xen/console/guest-<vmname>.log

Expected behavior Normal start.

Actual behavior

[    3.456316] BUG: unable to handle page fault for address: ffffc9000096b818
[    3.456326] #PF: supervisor read access in kernel mode
[    3.456333] #PF: error_code(0x0000) - not-present page
[    3.456341] PGD 18829067 P4D 18829067 PUD 18828067 PMD 1355b067 PTE 0
[    3.456352] Oops: 0000 [#1] SMP NOPTI
[    3.456359] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.5-1.qubes.x86_64 #1
[    3.456374] RIP: e030:pmc_core_probe+0x138/0x190
[    3.456382] Code: 82 48 c7 c7 68 f3 24 83 e8 f5 5f 77 ff 48 8b 05 4e a6 87 01 48 c7 83 88 00 00 00 40 f3 24 83 48 63 40 4c 48 03 05 30 a6 87 01 <8b> 00 48 8b 15 2f a6 87 01 48 c7 c7 80 db 13 82 8b 4a 50 ba 01 00
[    3.456405] RSP: e02b:ffffc9000080bc28 EFLAGS: 00010286
[    3.456414] RAX: ffffc9000096b818 RBX: ffffffff827ee180 RCX: 00000000fe002000
[    3.456425] RDX: ffffffff8324f320 RSI: ffffffff8245d710 RDI: ffffffff8324f368
[    3.456435] RBP: ffffffff827ee190 R08: 0000000000000000 R09: 00000000fe001fff
[    3.456445] R10: 0000000000007ff0 R11: ffff888012d048c0 R12: 0000000000000000
[    3.456455] R13: 0000000000000000 R14: 0000000000000000 R15: 64eb6bcdd2d8f073
[    3.456478] FS:  0000000000000000(0000) GS:ffff888014000000(0000) knlGS:0000000000000000
[    3.456489] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.456497] CR2: ffffc9000096b818 CR3: 000000000260a000 CR4: 0000000000040660
[    3.456514] Call Trace:
[    3.456523]  platform_drv_probe+0x35/0x80
[    3.456531]  really_probe+0x2cd/0x400
[    3.456538]  driver_probe_device+0xe1/0x150
[    3.456545]  ? driver_allows_async_probing+0x50/0x50
[    3.456555]  bus_for_each_drv+0x8f/0xd0
[    3.456562]  __device_attach+0xe9/0x1e0
[    3.456569]  bus_probe_device+0x8e/0xa0
[    3.456576]  device_add+0x38d/0x680
[    3.456583]  platform_device_add+0x105/0x230
[    3.456594]  ? pmc_core_driver_init+0x40/0x40
[    3.456603]  do_one_initcall+0x59/0x220
[    3.456612]  do_initcalls+0x112/0x16f
[    3.456620]  kernel_init_freeable+0x170/0x22d
[    3.456630]  ? rest_init+0xaa/0xaa
[    3.456637]  kernel_init+0xa/0x106
[    3.456644]  ret_from_fork+0x22/0x30
[    3.456651] Modules linked in:
[    3.456658] CR2: ffffc9000096b818
[    3.456669] ---[ end trace 7604e0e19a308f23 ]---
[    3.456679] RIP: e030:pmc_core_probe+0x138/0x190
[    3.456686] Code: 82 48 c7 c7 68 f3 24 83 e8 f5 5f 77 ff 48 8b 05 4e a6 87 01 48 c7 83 88 00 00 00 40 f3 24 83 48 63 40 4c 48 03 05 30 a6 87 01 <8b> 00 48 8b 15 2f a6 87 01 48 c7 c7 80 db 13 82 8b 4a 50 ba 01 00
[    3.456713] RSP: e02b:ffffc9000080bc28 EFLAGS: 00010286
[    3.456721] RAX: ffffc9000096b818 RBX: ffffffff827ee180 RCX: 00000000fe002000
[    3.456764] RDX: ffffffff8324f320 RSI: ffffffff8245d710 RDI: ffffffff8324f368
[    3.456775] RBP: ffffffff827ee190 R08: 0000000000000000 R09: 00000000fe001fff
[    3.456785] R10: 0000000000007ff0 R11: ffff888012d048c0 R12: 0000000000000000
[    3.456795] R13: 0000000000000000 R14: 0000000000000000 R15: 64eb6bcdd2d8f073
[    3.456815] FS:  0000000000000000(0000) GS:ffff888014000000(0000) knlGS:0000000000000000
[    3.456828] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.456833] CR2: ffffc9000096b818 CR3: 000000000260a000 CR4: 0000000000040660
[    3.456833] Kernel panic - not syncing: Fatal exception
[    3.456833] Kernel Offset: disabled

Additional context If relevant, the CPU is Intel Core i7-8750H.

Solutions you've tried Changing kernel version doesn't help (tried 5.4.61, 5.7.10, 5.8.5). But the Linux PV stubdomain does work (it doesn't include the driver that triggers the crash).

Relevant documentation you've consulted A list of links to the Qubes documentation (or other relevant software documentation) pages you have already consulted.

Related, non-duplicate issues A list of links to other bug reports, feature requests, or tasks in the qubes-issues tracker (or "none" if you didn't find any). Do not describe any other unreported bugs, features, or tasks here.

Full Linux boot log

``` [2020-09-13 12:08:54] [ 0.000000] Linux version 5.8.5-1.qubes.x86_64 (mockbuild@) (gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), GNU ld version 2.34-4.fc32) #1 SMP Sat Aug 29 14:01:17 CEST 2020 [2020-09-13 12:08:54] [ 0.000000] Command line: root=/dev/mapper/dmroot ro nomodeset console=hvc0 rd_NO_PLYMOUTH rd.plymouth.enable=0 plymouth.enable=0 xen_scrub_pages=0 nopat [2020-09-13 12:08:54] [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [2020-09-13 12:08:54] [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [2020-09-13 12:08:54] [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [2020-09-13 12:08:54] [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [2020-09-13 12:08:54] [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [2020-09-13 12:08:54] [ 0.000000] ACPI in unprivileged domain disabled [2020-09-13 12:08:54] [ 0.000000] Released 0 page(s) [2020-09-13 12:08:54] [ 0.000000] BIOS-provided physical RAM map: [2020-09-13 12:08:54] [ 0.000000] Xen: [mem 0x0000000000000000-0x000000000009ffff] usable [2020-09-13 12:08:54] [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved [2020-09-13 12:08:54] [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000f4efffff] usable [2020-09-13 12:08:54] [ 0.000000] x86/PAT: PAT support disabled via boot option. [2020-09-13 12:08:54] [ 0.000000] NX (Execute Disable) protection: active [2020-09-13 12:08:54] [ 0.000000] DMI not present or invalid. [2020-09-13 12:08:54] [ 0.000000] Hypervisor detected: Xen PV [2020-09-13 12:08:54] [ 0.188030] tsc: Fast TSC calibration failed [2020-09-13 12:08:54] [ 0.188035] tsc: Detected 2207.992 MHz processor [2020-09-13 12:08:54] [ 0.188058] last_pfn = 0xf4f00 max_arch_pfn = 0x400000000 [2020-09-13 12:08:54] [ 0.188059] Disabled [2020-09-13 12:08:54] [ 0.188065] x86/PAT: Configuration [0-7]: WB WT UC- UC WC WP UC UC [2020-09-13 12:08:54] [ 0.419648] Kernel/User page tables isolation: disabled on XEN PV. [2020-09-13 12:08:54] [ 1.793672] RAMDISK: [mem 0x04000000-0x051fcfff] [2020-09-13 12:08:54] [ 1.793698] NUMA turned off [2020-09-13 12:08:54] [ 1.793699] Faking a node at [mem 0x0000000000000000-0x00000000f4efffff] [2020-09-13 12:08:54] [ 1.793716] NODE_DATA(0) allocated [mem 0x18856000-0x18880fff] [2020-09-13 12:08:54] [ 1.828156] Zone ranges: [2020-09-13 12:08:54] [ 1.828158] DMA [mem 0x0000000000001000-0x0000000000ffffff] [2020-09-13 12:08:54] [ 1.828160] DMA32 [mem 0x0000000001000000-0x00000000f4efffff] [2020-09-13 12:08:54] [ 1.828162] Normal empty [2020-09-13 12:08:54] [ 1.828163] Device empty [2020-09-13 12:08:54] [ 1.828164] Movable zone start for each node [2020-09-13 12:08:54] [ 1.828168] Early memory node ranges [2020-09-13 12:08:54] [ 1.828169] node 0: [mem 0x0000000000001000-0x000000000009ffff] [2020-09-13 12:08:54] [ 1.828171] node 0: [mem 0x0000000000100000-0x00000000f4efffff] [2020-09-13 12:08:54] [ 1.828307] Zeroed struct page in unavailable ranges: 12641 pages [2020-09-13 12:08:54] [ 1.828309] Initmem setup node 0 [mem 0x0000000000001000-0x00000000f4efffff] [2020-09-13 12:08:54] [ 1.836886] p2m virtual area at (____ptrval____), size is 800000 [2020-09-13 12:08:54] [ 1.860771] Remapped 0 page(s) [2020-09-13 12:08:54] [ 1.860903] smpboot: Allowing 2 CPUs, 0 hotplug CPUs [2020-09-13 12:08:54] [ 1.860913] [mem 0xf4f00000-0xffffffff] available for PCI devices [2020-09-13 12:08:54] [ 1.860917] Booting paravirtualized kernel on Xen [2020-09-13 12:08:54] [ 1.860919] Xen version: 4.14.0 (preserve-AD) [2020-09-13 12:08:54] [ 1.860922] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns [2020-09-13 12:08:54] [ 1.867827] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1 [2020-09-13 12:08:54] [ 1.867999] percpu: Embedded 54 pages/cpu s184320 r8192 d28672 u1048576 [2020-09-13 12:08:54] [ 1.868067] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear) [2020-09-13 12:08:54] [ 1.868072] Built 1 zonelists, mobility grouping on. Total pages: 987470 [2020-09-13 12:08:54] [ 1.868074] Policy zone: DMA32 [2020-09-13 12:08:54] [ 1.868076] Kernel command line: root=/dev/mapper/dmroot ro nomodeset console=hvc0 rd_NO_PLYMOUTH rd.plymouth.enable=0 plymouth.enable=0 xen_scrub_pages=0 nopat [2020-09-13 12:08:54] [ 1.868111] You have booted with nomodeset. This means your GPU drivers are DISABLED [2020-09-13 12:08:54] [ 1.868112] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly [2020-09-13 12:08:54] [ 1.868113] Unless you actually understand what nomodeset does, you should reboot without enabling it [2020-09-13 12:08:54] [ 1.868450] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [2020-09-13 12:08:54] [ 1.868581] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [2020-09-13 12:08:54] [ 1.869013] mem auto-init: stack:byref_all, heap alloc:off, heap free:off [2020-09-13 12:08:54] [ 1.881471] Memory: 265608K/4012668K available (14339K kernel code, 2479K rwdata, 5144K rodata, 2956K init, 5944K bss, 3747060K reserved, 0K cma-reserved) [2020-09-13 12:08:54] [ 1.881481] random: get_random_u64 called from kmem_cache_open+0x26/0x230 with crng_init=0 [2020-09-13 12:08:54] [ 1.881814] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [2020-09-13 12:08:54] [ 1.882673] ftrace: allocating 47075 entries in 184 pages [2020-09-13 12:08:54] [ 1.904269] ftrace: allocated 184 pages with 4 groups [2020-09-13 12:08:54] [ 1.904672] rcu: Hierarchical RCU implementation. [2020-09-13 12:08:54] [ 1.904674] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2. [2020-09-13 12:08:54] [ 1.904675] Trampoline variant of Tasks RCU enabled. [2020-09-13 12:08:54] [ 1.904676] Rude variant of Tasks RCU enabled. [2020-09-13 12:08:54] [ 1.904677] Tracing variant of Tasks RCU enabled. [2020-09-13 12:08:54] [ 1.904678] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. [2020-09-13 12:08:54] [ 1.904679] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 [2020-09-13 12:08:54] [ 1.911485] Using NULL legacy PIC [2020-09-13 12:08:54] [ 1.911487] NR_IRQS: 524544, nr_irqs: 48, preallocated irqs: 0 [2020-09-13 12:08:54] [ 1.911548] xen:events: Using FIFO-based ABI [2020-09-13 12:08:54] [ 1.911811] random: crng done (trusting CPU's manufacturer) [2020-09-13 12:08:54] [ 1.911858] Console: colour dummy device 80x25 [2020-09-13 12:08:54] [ 1.911946] printk: console [tty0] enabled [2020-09-13 12:08:54] [ 1.912691] printk: console [hvc0] enabled [2020-09-13 12:08:54] [ 1.912737] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [2020-09-13 12:08:54] [ 1.912758] installing Xen timer for CPU 0 [2020-09-13 12:08:54] [ 1.912806] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fd3b0b04b0, max_idle_ns: 440795253801 ns [2020-09-13 12:08:54] [ 1.912823] Calibrating delay loop (skipped), value calculated using timer frequency.. 4415.98 BogoMIPS (lpj=2207992) [2020-09-13 12:08:54] [ 1.912838] pid_max: default: 32768 minimum: 301 [2020-09-13 12:08:54] [ 1.912952] LSM: Security Framework initializing [2020-09-13 12:08:54] [ 1.912976] Yama: becoming mindful. [2020-09-13 12:08:54] [ 1.913053] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [2020-09-13 12:08:54] [ 1.913074] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [2020-09-13 12:08:54] [ 1.914022] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8 [2020-09-13 12:08:54] [ 1.914035] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4 [2020-09-13 12:08:54] [ 1.914049] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [2020-09-13 12:08:54] [ 1.914063] Spectre V2 : Mitigation: Full generic retpoline [2020-09-13 12:08:54] [ 1.914070] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch [2020-09-13 12:08:54] [ 1.914084] Spectre V2 : Enabling Restricted Speculation for firmware calls [2020-09-13 12:08:54] [ 1.914094] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier [2020-09-13 12:08:54] [ 1.914106] Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl [2020-09-13 12:08:54] [ 1.914117] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [2020-09-13 12:08:54] [ 1.914131] SRBDS: Unknown: Dependent on hypervisor status [2020-09-13 12:08:54] [ 1.914139] MDS: Mitigation: Clear CPU buffers [2020-09-13 12:08:54] [ 2.010217] cpu 0 spinlock event irq 1 [2020-09-13 12:08:54] [ 2.010235] VPMU disabled by hypervisor. [2020-09-13 12:08:54] [ 2.010643] Performance Events: unsupported p6 CPU model 158 no PMU driver, software events only. [2020-09-13 12:08:54] [ 2.010766] rcu: Hierarchical SRCU implementation. [2020-09-13 12:08:54] [ 2.011341] NMI watchdog: Perf NMI watchdog permanently disabled [2020-09-13 12:08:54] [ 2.011435] smp: Bringing up secondary CPUs ... [2020-09-13 12:08:54] [ 2.011691] installing Xen timer for CPU 1 [2020-09-13 12:08:54] [ 2.011739] SMP alternatives: switching to SMP code [2020-09-13 12:08:54] [ 2.106109] cpu 1 spinlock event irq 13 [2020-09-13 12:08:54] [ 2.107108] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. [2020-09-13 12:08:54] [ 2.107357] smp: Brought up 1 node, 2 CPUs [2020-09-13 12:08:54] [ 2.107366] smpboot: Max logical packages: 1 [2020-09-13 12:08:54] [ 2.107628] devtmpfs: initialized [2020-09-13 12:08:54] [ 2.107628] x86/mm: Memory block size: 128MB [2020-09-13 12:08:54] [ 2.108197] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns [2020-09-13 12:08:54] [ 2.108197] futex hash table entries: 512 (order: 3, 32768 bytes, linear) [2020-09-13 12:08:54] [ 2.108197] xor: automatically using best checksumming function avx [2020-09-13 12:08:54] [ 2.108197] pinctrl core: initialized pinctrl subsystem [2020-09-13 12:08:54] [ 2.127441] PM: RTC time: 165:165:165, date: 2065-165-165 [2020-09-13 12:08:54] [ 2.127484] thermal_sys: Registered thermal governor 'fair_share' [2020-09-13 12:08:54] [ 2.127485] thermal_sys: Registered thermal governor 'bang_bang' [2020-09-13 12:08:54] [ 2.127494] thermal_sys: Registered thermal governor 'step_wise' [2020-09-13 12:08:54] [ 2.127503] thermal_sys: Registered thermal governor 'user_space' [2020-09-13 12:08:54] [ 2.128868] NET: Registered protocol family 16 [2020-09-13 12:08:54] [ 2.128868] xen:grant_table: Grant tables using version 1 layout [2020-09-13 12:08:54] [ 2.128868] Grant table initialized [2020-09-13 12:08:54] [ 2.128868] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations [2020-09-13 12:08:54] [ 2.128868] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations [2020-09-13 12:08:54] [ 2.128868] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations [2020-09-13 12:08:54] [ 2.128868] audit: initializing netlink subsys (disabled) [2020-09-13 12:08:54] [ 2.128884] audit: type=2000 audit(1600013334.253:1): state=initialized audit_enabled=0 res=1 [2020-09-13 12:08:54] [ 2.130838] PCI: setting up Xen PCI frontend stub [2020-09-13 12:08:54] [ 2.293006] cryptd: max_cpu_qlen set to 1000 [2020-09-13 12:08:54] [ 2.299932] alg: No test for 842 (842-generic) [2020-09-13 12:08:54] [ 2.299932] alg: No test for 842 (842-scomp) [2020-09-13 12:08:54] [ 2.305072] raid6: skip pq benchmark and using algorithm avx2x4 [2020-09-13 12:08:54] [ 2.305072] raid6: using avx2x2 recovery algorithm [2020-09-13 12:08:54] [ 2.305072] fbcon: Taking over console [2020-09-13 12:08:54] [ 2.305072] ACPI: Interpreter disabled. [2020-09-13 12:08:54] [ 2.305072] xen:balloon: Initialising balloon driver [2020-09-13 12:08:54] [ 2.360961] iommu: Default domain type: Translated [2020-09-13 12:08:54] [ 2.361041] vgaarb: loaded [2020-09-13 12:08:54] [ 2.361844] SCSI subsystem initialized [2020-09-13 12:08:54] [ 2.361978] usbcore: registered new interface driver usbfs [2020-09-13 12:08:54] [ 2.361998] usbcore: registered new interface driver hub [2020-09-13 12:08:54] [ 2.362034] usbcore: registered new device driver usb [2020-09-13 12:08:54] [ 2.362067] pps_core: LinuxPPS API ver. 1 registered [2020-09-13 12:08:54] [ 2.362075] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti [2020-09-13 12:08:54] [ 2.362090] PTP clock support registered [2020-09-13 12:08:54] [ 2.362156] EDAC MC: Ver: 3.0.0 [2020-09-13 12:08:54] [ 2.362156] NetLabel: Initializing [2020-09-13 12:08:54] [ 2.362156] NetLabel: domain hash size = 128 [2020-09-13 12:08:54] [ 2.362156] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO [2020-09-13 12:08:54] [ 2.362156] NetLabel: unlabeled traffic allowed by default [2020-09-13 12:08:54] [ 2.362156] PCI: System does not support PCI [2020-09-13 12:08:54] [ 2.362156] clocksource: Switched to clocksource xen [2020-09-13 12:08:54] [ 2.379374] VFS: Disk quotas dquot_6.6.0 [2020-09-13 12:08:54] [ 2.379466] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [2020-09-13 12:08:54] [ 2.379573] hugetlbfs: disabling because there are no supported hugepage sizes [2020-09-13 12:08:54] [ 2.379670] pnp: PnP ACPI: disabled [2020-09-13 12:08:54] [ 2.383919] NET: Registered protocol family 2 [2020-09-13 12:08:54] [ 2.384082] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear) [2020-09-13 12:08:54] [ 2.384104] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear) [2020-09-13 12:08:54] [ 2.384166] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear) [2020-09-13 12:08:54] [ 2.384232] TCP: Hash tables configured (established 32768 bind 32768) [2020-09-13 12:08:54] [ 2.384361] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear) [2020-09-13 12:08:54] [ 2.384381] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear) [2020-09-13 12:08:54] [ 2.384571] NET: Registered protocol family 1 [2020-09-13 12:08:54] [ 2.384589] NET: Registered protocol family 44 [2020-09-13 12:08:54] [ 2.384600] PCI: CLS 0 bytes, default 64 [2020-09-13 12:08:54] [ 2.384639] Trying to unpack rootfs image as initramfs... [2020-09-13 12:08:54] [ 2.395202] Freeing initrd memory: 18420K [2020-09-13 12:08:54] [ 2.396079] Initialise system trusted keyrings [2020-09-13 12:08:54] [ 2.396120] Key type blacklist registered [2020-09-13 12:08:54] [ 2.396251] workingset: timestamp_bits=36 max_order=17 bucket_order=0 [2020-09-13 12:08:54] [ 2.397568] zbud: loaded [2020-09-13 12:08:54] [ 2.398081] integrity: Platform Keyring initialized [2020-09-13 12:08:54] [ 2.409194] NET: Registered protocol family 38 [2020-09-13 12:08:54] [ 2.409211] Key type asymmetric registered [2020-09-13 12:08:54] [ 2.409218] Asymmetric key parser 'x509' registered [2020-09-13 12:08:54] [ 2.409242] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [2020-09-13 12:08:54] [ 2.409354] io scheduler mq-deadline registered [2020-09-13 12:08:54] [ 2.409364] io scheduler kyber registered [2020-09-13 12:08:54] [ 2.409412] io scheduler bfq registered [2020-09-13 12:08:54] [ 2.409510] atomic64_test: passed for x86-64 platform with CX8 and with SSE [2020-09-13 12:08:54] [ 2.410350] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled [2020-09-13 12:08:54] [ 2.412720] Non-volatile memory driver v1.3 [2020-09-13 12:08:54] [ 2.412775] Linux agpgart interface v0.103 [2020-09-13 12:08:54] [ 2.413238] libphy: Fixed MDIO Bus: probed [2020-09-13 12:08:54] [ 2.413351] usbcore: registered new interface driver usbserial_generic [2020-09-13 12:08:54] [ 2.413364] usbserial: USB Serial support registered for generic [2020-09-13 12:08:54] [ 2.413385] i8042: PNP: No PS/2 controller found. [2020-09-13 12:08:54] [ 2.413393] i8042: Probing ports directly. [2020-09-13 12:08:55] [ 3.455635] i8042: No controller found [2020-09-13 12:08:55] [ 3.455797] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fd3b0b04b0, max_idle_ns: 440795253801 ns [2020-09-13 12:08:55] [ 3.455863] mousedev: PS/2 mouse device common for all mice [2020-09-13 12:08:55] [ 3.456000] device-mapper: uevent: version 1.0.3 [2020-09-13 12:08:55] [ 3.456083] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com [2020-09-13 12:08:55] [ 3.456181] intel_pstate: CPU model not supported [2020-09-13 12:08:55] [ 3.456211] hid: raw HID events driver (C) Jiri Kosina [2020-09-13 12:08:55] [ 3.456251] usbcore: registered new interface driver usbhid [2020-09-13 12:08:55] [ 3.456259] usbhid: USB HID core driver [2020-09-13 12:08:55] [ 3.456316] BUG: unable to handle page fault for address: ffffc9000096b818 [2020-09-13 12:08:55] [ 3.456326] #PF: supervisor read access in kernel mode [2020-09-13 12:08:55] [ 3.456333] #PF: error_code(0x0000) - not-present page [2020-09-13 12:08:55] [ 3.456341] PGD 18829067 P4D 18829067 PUD 18828067 PMD 1355b067 PTE 0 [2020-09-13 12:08:55] [ 3.456352] Oops: 0000 [#1] SMP NOPTI [2020-09-13 12:08:55] [ 3.456359] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.5-1.qubes.x86_64 #1 [2020-09-13 12:08:55] [ 3.456374] RIP: e030:pmc_core_probe+0x138/0x190 [2020-09-13 12:08:55] [ 3.456382] Code: 82 48 c7 c7 68 f3 24 83 e8 f5 5f 77 ff 48 8b 05 4e a6 87 01 48 c7 83 88 00 00 00 40 f3 24 83 48 63 40 4c 48 03 05 30 a6 87 01 <8b> 00 48 8b 15 2f a6 87 01 48 c7 c7 80 db 13 82 8b 4a 50 ba 01 00 [2020-09-13 12:08:55] [ 3.456405] RSP: e02b:ffffc9000080bc28 EFLAGS: 00010286 [2020-09-13 12:08:55] [ 3.456414] RAX: ffffc9000096b818 RBX: ffffffff827ee180 RCX: 00000000fe002000 [2020-09-13 12:08:55] [ 3.456425] RDX: ffffffff8324f320 RSI: ffffffff8245d710 RDI: ffffffff8324f368 [2020-09-13 12:08:55] [ 3.456435] RBP: ffffffff827ee190 R08: 0000000000000000 R09: 00000000fe001fff [2020-09-13 12:08:55] [ 3.456445] R10: 0000000000007ff0 R11: ffff888012d048c0 R12: 0000000000000000 [2020-09-13 12:08:55] [ 3.456455] R13: 0000000000000000 R14: 0000000000000000 R15: 64eb6bcdd2d8f073 [2020-09-13 12:08:55] [ 3.456478] FS: 0000000000000000(0000) GS:ffff888014000000(0000) knlGS:0000000000000000 [2020-09-13 12:08:55] [ 3.456489] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [2020-09-13 12:08:55] [ 3.456497] CR2: ffffc9000096b818 CR3: 000000000260a000 CR4: 0000000000040660 [2020-09-13 12:08:55] [ 3.456514] Call Trace: [2020-09-13 12:08:55] [ 3.456523] platform_drv_probe+0x35/0x80 [2020-09-13 12:08:55] [ 3.456531] really_probe+0x2cd/0x400 [2020-09-13 12:08:55] [ 3.456538] driver_probe_device+0xe1/0x150 [2020-09-13 12:08:55] [ 3.456545] ? driver_allows_async_probing+0x50/0x50 [2020-09-13 12:08:55] [ 3.456555] bus_for_each_drv+0x8f/0xd0 [2020-09-13 12:08:55] [ 3.456562] __device_attach+0xe9/0x1e0 [2020-09-13 12:08:55] [ 3.456569] bus_probe_device+0x8e/0xa0 [2020-09-13 12:08:55] [ 3.456576] device_add+0x38d/0x680 [2020-09-13 12:08:55] [ 3.456583] platform_device_add+0x105/0x230 [2020-09-13 12:08:55] [ 3.456594] ? pmc_core_driver_init+0x40/0x40 [2020-09-13 12:08:55] [ 3.456603] do_one_initcall+0x59/0x220 [2020-09-13 12:08:55] [ 3.456612] do_initcalls+0x112/0x16f [2020-09-13 12:08:55] [ 3.456620] kernel_init_freeable+0x170/0x22d [2020-09-13 12:08:55] [ 3.456630] ? rest_init+0xaa/0xaa [2020-09-13 12:08:55] [ 3.456637] kernel_init+0xa/0x106 [2020-09-13 12:08:55] [ 3.456644] ret_from_fork+0x22/0x30 [2020-09-13 12:08:55] [ 3.456651] Modules linked in: [2020-09-13 12:08:55] [ 3.456658] CR2: ffffc9000096b818 [2020-09-13 12:08:55] [ 3.456669] ---[ end trace 7604e0e19a308f23 ]--- [2020-09-13 12:08:55] [ 3.456679] RIP: e030:pmc_core_probe+0x138/0x190 [2020-09-13 12:08:55] [ 3.456686] Code: 82 48 c7 c7 68 f3 24 83 e8 f5 5f 77 ff 48 8b 05 4e a6 87 01 48 c7 83 88 00 00 00 40 f3 24 83 48 63 40 4c 48 03 05 30 a6 87 01 <8b> 00 48 8b 15 2f a6 87 01 48 c7 c7 80 db 13 82 8b 4a 50 ba 01 00 [2020-09-13 12:08:55] [ 3.456713] RSP: e02b:ffffc9000080bc28 EFLAGS: 00010286 [2020-09-13 12:08:55] [ 3.456721] RAX: ffffc9000096b818 RBX: ffffffff827ee180 RCX: 00000000fe002000 [2020-09-13 12:08:55] [ 3.456764] RDX: ffffffff8324f320 RSI: ffffffff8245d710 RDI: ffffffff8324f368 [2020-09-13 12:08:55] [ 3.456775] RBP: ffffffff827ee190 R08: 0000000000000000 R09: 00000000fe001fff [2020-09-13 12:08:55] [ 3.456785] R10: 0000000000007ff0 R11: ffff888012d048c0 R12: 0000000000000000 [2020-09-13 12:08:55] [ 3.456795] R13: 0000000000000000 R14: 0000000000000000 R15: 64eb6bcdd2d8f073 [2020-09-13 12:08:55] [ 3.456815] FS: 0000000000000000(0000) GS:ffff888014000000(0000) knlGS:0000000000000000 [2020-09-13 12:08:55] [ 3.456828] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [2020-09-13 12:08:55] [ 3.456833] CR2: ffffc9000096b818 CR3: 000000000260a000 CR4: 0000000000040660 [2020-09-13 12:08:55] [ 3.456833] Kernel panic - not syncing: Fatal exception [2020-09-13 12:08:55] [ 3.456833] Kernel Offset: disabled ```

dylangerdaly commented 4 years ago

Why is PV even required? It's my understanding PV is terrible

iamahuman commented 4 years ago

Why is PV even required? It's my understanding PV is terrible

@dylangerdaly I guess it mainly eases testing of Qubes OS without the need for nested virtualization.

unixos commented 4 years ago

Here it is working if I do the steps to reproduce

screen

marmarek commented 4 years ago

It may be related to the CPU model, since "pmc_core_probe" (the crashing function) is in "Intel PMC Core Driver".

unixos commented 4 years ago

CPU is Intel i7 4790K

m-v-b commented 4 years ago

@marmarek, I encountered the same issue a few months ago on Qubes OS 4.0, when experimenting with PV-based VMs for a personal reason. I have a workaround in my local git tree, but I never got around to publishing it, mainly due to the workaround not being sufficiently elegant. It looks like you have run into the same issue.

In summary, the issue only occurs when the PV VM has less than 4064 MiB of memory. There is a heuristic in the device driver which attempts to avoid ioremap()'ing the memory mapped registers starting at PMC_BASE_ADDR_DEFAULT (0xFE000000 == 4064 MiB) if the addresses correspond to RAM. (I had added this heuristic in the past.) However, if the VM has memory less than 4064 MiB, then the heuristic fails to work as expected, as addresses equal to and above 0xFE000000 do not show up as RAM in the memory map.

All this to say, I ended up working around this issue locally by applying a patch, which, in my opinion, takes an unacceptable approach, because with this work-around the driver "knows" whether it is operating on a Xen-based system or not, which is not ideal. Here is the patch, but this issue can be temporarily worked around by setting the VM's RAM size to 4 GiB for the time being.

Hope this helps!

commit 8fe194a4c134a9ed24701b510a84c82cbba3285a
Author: M. Vefa Bicakci <m.v.b@runbox.com>
Date:   Fri Apr 24 17:46:55 2020 +0300

    intel_pmc_core: Disable for non-dom0 Xen PV domains

    Prior to this commit, attempting to start a Xen PV domain with less than
    0xFE000000 bytes (i.e., 4064 MiB) of memory would cause an unrecoverable
    page fault as follows, which causes the domain to abruptly die:

      BUG: unable to handle page fault for address: ffffc9000062b818
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 3fbd8067 P4D 3fbd8067 PUD 3fbd7067 PMD 3ce0b067 PTE 0
      Oops: 0000 [#1] SMP NOPTI
      CPU: 0 PID: 407 Comm: systemd-udevd Tainted: G           O      5.6.7-1 #1
      RIP: e030:pmc_core_probe+0x140/0x300 [intel_pmc_core]

    Further debugging revealed that ioremap(0xFE000000, ...) works as
    expected in pmc_core_probe function, and the aforementioned error
    occurs when the driver attempts to access the memory mapping by calling
    the pmc_core_check_read_lock_bit() function.

    Given that the intel_pmc_core driver appears to be incompatible with
    non-dom0 para-virtualized (PV) Xen domains in general as well (i.e., not
    only in configurations with less than 4064 MiB of memory), this commit
    disables the driver in such configurations at run-time. Please note that
    the driver works as expected in dom0.

diff --git a/drivers/platform/x86/intel_pmc_core.c b/drivers/platform/x86/intel_pmc_core.c
index 7c8bdab078cf..3cb41381309d 100644
--- a/drivers/platform/x86/intel_pmc_core.c
+++ b/drivers/platform/x86/intel_pmc_core.c
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/suspend.h>
 #include <linux/uaccess.h>
+#include <xen/xen.h>

 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
@@ -1195,6 +1196,13 @@ static int pmc_core_probe(struct platform_device *pdev)
        const struct x86_cpu_id *cpu_id;
        u64 slp_s0_addr;

+#if IS_ENABLED(CONFIG_XEN_PV)
+       if (xen_pv_domain() && !xen_initial_domain()) {
+               dev_info(&pdev->dev, "not compatible with non-dom0 Xen PV domains\n");
+               return -ENODEV;
+       }
+#endif
+
        if (device_initialized)
                return -ENODEV;

marmarek commented 3 years ago

@m-v-b have you reported the issue somewhere? I'd like to see what it will take to fix it properly upstream, or at the very least, have your patch included.

m-v-b commented 3 years ago

@marmarek, sorry for the delay! To answer your question, I have unfortunately not reported the issue, at least yet.

In the past two days, I have worked on a (subjectively) more acceptable approach, by changing the patch so that the driver is not aware of Xen at all. The changes involve walking the system's memory map to determine whether the address 0xFE000000 is in the memory map and to determine whether the address refers to system RAM. I have tested the updated patch with pv, pvh, and hvm virtual machines with different amounts of RAM, and also on a non-virtualised host.

All this to say, I hope to publish a patch in a relevant mailing list soon (i.e., possibly next weekend). If you would like to have a copy, I can publish it here ahead of time as well.

bennykusman commented 3 years ago

@marmarek, I encountered the same issue a few months ago on Qubes OS 4.0, when experimenting with PV-based VMs for a personal reason. I have a workaround in my local git tree, but I never got around to publishing it, mainly due to the workaround not being sufficiently elegant. It looks like you have run into the same issue.

In summary, the issue only occurs when the PV VM has less than 4064 MiB of memory. There is a heuristic in the device driver which attempts to avoid ioremap()'ing the memory mapped registers starting at PMC_BASE_ADDR_DEFAULT (0xFE000000 == 4064 MiB) if the addresses correspond to RAM. (I had added this heuristic in the past.) However, if the VM has memory less than 4064 MiB, then the heuristic fails to work as expected, as addresses equal to and above 0xFE000000 do not show up as RAM in the memory map.

All this to say, I ended up working around this issue locally by applying a patch, which, in my opinion, takes an unacceptable approach, because with this work-around the driver "knows" whether it is operating on a Xen-based system or not, which is not ideal. Here is the patch, but this issue can be temporarily worked around by setting the VM's RAM size to 4 GiB for the time being.

Hope this helps!
commit 8fe194a4c134a9ed24701b510a84c82cbba3285a
Author: M. Vefa Bicakci <m.v.b@runbox.com>
Date:   Fri Apr 24 17:46:55 2020 +0300

    intel_pmc_core: Disable for non-dom0 Xen PV domains

    Prior to this commit, attempting to start a Xen PV domain with less than
    0xFE000000 bytes (i.e., 4064 MiB) of memory would cause an unrecoverable
    page fault as follows, which causes the domain to abruptly die:

      BUG: unable to handle page fault for address: ffffc9000062b818
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 3fbd8067 P4D 3fbd8067 PUD 3fbd7067 PMD 3ce0b067 PTE 0
      Oops: 0000 [#1] SMP NOPTI
      CPU: 0 PID: 407 Comm: systemd-udevd Tainted: G           O      5.6.7-1 #1
      RIP: e030:pmc_core_probe+0x140/0x300 [intel_pmc_core]

    Further debugging revealed that ioremap(0xFE000000, ...) works as
    expected in pmc_core_probe function, and the aforementioned error
    occurs when the driver attempts to access the memory mapping by calling
    the pmc_core_check_read_lock_bit() function.

    Given that the intel_pmc_core driver appears to be incompatible with
    non-dom0 para-virtualized (PV) Xen domains in general as well (i.e., not
    only in configurations with less than 4064 MiB of memory), this commit
    disables the driver in such configurations at run-time. Please note that
    the driver works as expected in dom0.

diff --git a/drivers/platform/x86/intel_pmc_core.c b/drivers/platform/x86/intel_pmc_core.c
index 7c8bdab078cf..3cb41381309d 100644
--- a/drivers/platform/x86/intel_pmc_core.c
+++ b/drivers/platform/x86/intel_pmc_core.c
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/suspend.h>
 #include <linux/uaccess.h>
+#include <xen/xen.h>

 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
@@ -1195,6 +1196,13 @@ static int pmc_core_probe(struct platform_device *pdev)
        const struct x86_cpu_id *cpu_id;
        u64 slp_s0_addr;

+#if IS_ENABLED(CONFIG_XEN_PV)
+       if (xen_pv_domain() && !xen_initial_domain()) {
+               dev_info(&pdev->dev, "not compatible with non-dom0 Xen PV domains\n");
+               return -ENODEV;
+       }
+#endif
+
        if (device_initialized)
                return -ENODEV;

do we know if this already approved in upstream ?

DemiMarie commented 3 years ago

Do we actually need to support PV domains? They are a bad idea for all sorts of security reasons.

m-v-b commented 3 years ago

@bennykusman, as far as I know, this problem has not yet been resolved upstream. Here is a link to the relevant driver code as of today in Linus Torvalds' git tree: Link to git.kernel.org

I have not had the time to publish a revised patch on a relevant mailing list for inclusion in the upstream kernel and to follow up with the code review process. I do not yet know when in the near future I will have some time to work on this; my apologies.

arno481 commented 3 years ago

I managed to work around this issue by setting the VM type to PVH (AFAIK the preferred VM mode for Xen). You can do this by adding

type = 'pvh'

to the xl .cfg file, e.g. /etc/xen/vm0.cfg

Hope this helps, Arno

DemiMarie commented 2 years ago

@m-v-b would you mind submitting a patch to our kernel tree?

xaki23 commented 2 years ago

how about just changing it from y to m in kernel config? (the driver used to be y/n, but is y/n/m these days)

and a VM should never have a reason to load the module...

m-v-b commented 2 years ago

@DemiMarie, sorry for the late reply. Of course, I can prepare a patch for the QubesOS/qubes-linux-kernel repository. May I learn how urgent this is?

In addition, would the patch quoted in https://github.com/QubesOS/qubes-issues/issues/6052#issuecomment-692360468 be sufficient? I am asking as I have another patch that solves this issue in a more elegant manner by checking the memory map instead of checking for Xen's domain 0, but as mentioned before, I never got around to publishing either patch on a kernel mailing list.

xaki23 commented 2 years ago

with the driver built as a module, it still tried to load it for whatever reason. after adding a hard blacklisting for the module in the template it worked.

from my pov the config change is a better idea than a patch because of less potential maintenance required in the future. and we are talking about an increasingly small userbase of not-officially-supported-to-begin-with hardware here. (no iommu)

DemiMarie commented 2 years ago

Config change is definitely a good idea. There is no point in even trying to load the module outside of dom0.

DemiMarie commented 2 years ago

@DemiMarie, sorry for the late reply. Of course, I can prepare a patch for the QubesOS/qubes-linux-kernel repository. May I learn how urgent this is?

In addition, would the patch quoted in #6052 (comment) be sufficient? I am asking as I have another patch that solves this issue in a more elegant manner by checking the memory map instead of checking for Xen's domain 0, but as mentioned before, I never got around to publishing either patch on a kernel mailing list.

For Qubes it should be fine, but I suspect upstream will want the more elegant version.

xaki23 commented 2 years ago

just confirmed a pv vm works with kernel-latest-qubes-vm built after the https://github.com/QubesOS/qubes-linux-kernel/pull/613 merge. test was done with 5.18.16-1.

besides kernel-latest, it also requires blacklisting the module inside the VM. i stored this as /etc/modprobe.d/pmc.conf in the template used for the pv VM:

blacklist intel_pmc_core
install intel_pmc_core /bin/false

(double-blacklist because just the blacklist entry doesn't block modules that are loaded as dependency of non-blacklisted modules. not sure if thats the case here, but it doesn't hurt to block it hard.) this is safe to deploy in the templatevm (== not just for the pv VM) because VMs never really have any use for that module anyways.

unless someone feels the need for more documentation of this workaround for unsupported setups, this issue can be closed at this point?

andrewdavidwong commented 2 years ago

Closing as resolved. If anyone believes this issue is not yet resolved, or if anyone is still affected by this issue, please leave a comment, and we'll be happy to reopen it. Thank you.

iamahuman commented 2 years ago

unless someone feels the need for more documentation of this workaround for unsupported setups,

Is PV + specific Intel CPU models considered "unsupported setups?"

this issue can be closed at this point?

There are clear action items (user documentation or a proper fix) left until the issue can be considered resolved, so I suppose we should leave it open.

xaki23 commented 2 years ago

Is PV + specific Intel CPU models considered "unsupported setups?"

systems without IOMMU (which is the only reason to use PV i am aware of) have been unsupported since qubes 4.0, which means 2018. the installer wil (or at least, should) be showing a clear warning that the system is unsupported before installation.

There are clear action items (user documentation or a proper fix) left until the issue can be considered resolved, so I suppose we should leave it open.

having the workaround documented here in the ticket for advanced users seems sufficient considering it is, again, an unsupported configuration.

iamahuman commented 2 years ago

To be clear, this does not affect device model domains. Is this correct?

marmarek commented 2 years ago

To be clear, this does not affect device model domains. Is this correct?

correct

QubesOS / qubes-issues

Linux PV domain crashes on Xen 4.14 #6052