Closed marmarek closed 2 years ago
Why is PV even required? It's my understanding PV is terrible
Why is PV even required? It's my understanding PV is terrible
@dylangerdaly I guess it mainly eases testing of Qubes OS without the need for nested virtualization.
Here it is working if I do the steps to reproduce
It may be related to the CPU model, since "pmc_core_probe" (the crashing function) is in "Intel PMC Core Driver".
CPU is Intel i7 4790K
@marmarek, I encountered the same issue a few months ago on Qubes OS 4.0, when experimenting with PV-based VMs for a personal reason. I have a workaround in my local git tree, but I never got around to publishing it, mainly due to the workaround not being sufficiently elegant. It looks like you have run into the same issue.
In summary, the issue only occurs when the PV VM has less than 4064 MiB of memory. There is a heuristic in the device driver which attempts to avoid ioremap()'ing the memory mapped registers starting at PMC_BASE_ADDR_DEFAULT
(0xFE000000
== 4064 MiB
) if the addresses correspond to RAM. (I had added this heuristic in the past.) However, if the VM has memory less than 4064 MiB
, then the heuristic fails to work as expected, as addresses equal to and above 0xFE000000
do not show up as RAM in the memory map.
All this to say, I ended up working around this issue locally by applying a patch, which, in my opinion, takes an unacceptable approach, because with this work-around the driver "knows" whether it is operating on a Xen-based system or not, which is not ideal. Here is the patch, but this issue can be temporarily worked around by setting the VM's RAM size to 4 GiB for the time being.
Hope this helps!
commit 8fe194a4c134a9ed24701b510a84c82cbba3285a
Author: M. Vefa Bicakci <m.v.b@runbox.com>
Date: Fri Apr 24 17:46:55 2020 +0300
intel_pmc_core: Disable for non-dom0 Xen PV domains
Prior to this commit, attempting to start a Xen PV domain with less than
0xFE000000 bytes (i.e., 4064 MiB) of memory would cause an unrecoverable
page fault as follows, which causes the domain to abruptly die:
BUG: unable to handle page fault for address: ffffc9000062b818
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3fbd8067 P4D 3fbd8067 PUD 3fbd7067 PMD 3ce0b067 PTE 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 407 Comm: systemd-udevd Tainted: G O 5.6.7-1 #1
RIP: e030:pmc_core_probe+0x140/0x300 [intel_pmc_core]
Further debugging revealed that ioremap(0xFE000000, ...) works as
expected in pmc_core_probe function, and the aforementioned error
occurs when the driver attempts to access the memory mapping by calling
the pmc_core_check_read_lock_bit() function.
Given that the intel_pmc_core driver appears to be incompatible with
non-dom0 para-virtualized (PV) Xen domains in general as well (i.e., not
only in configurations with less than 4064 MiB of memory), this commit
disables the driver in such configurations at run-time. Please note that
the driver works as expected in dom0.
diff --git a/drivers/platform/x86/intel_pmc_core.c b/drivers/platform/x86/intel_pmc_core.c
index 7c8bdab078cf..3cb41381309d 100644
--- a/drivers/platform/x86/intel_pmc_core.c
+++ b/drivers/platform/x86/intel_pmc_core.c
@@ -23,6 +23,7 @@
#include <linux/slab.h>
#include <linux/suspend.h>
#include <linux/uaccess.h>
+#include <xen/xen.h>
#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
@@ -1195,6 +1196,13 @@ static int pmc_core_probe(struct platform_device *pdev)
const struct x86_cpu_id *cpu_id;
u64 slp_s0_addr;
+#if IS_ENABLED(CONFIG_XEN_PV)
+ if (xen_pv_domain() && !xen_initial_domain()) {
+ dev_info(&pdev->dev, "not compatible with non-dom0 Xen PV domains\n");
+ return -ENODEV;
+ }
+#endif
+
if (device_initialized)
return -ENODEV;
@m-v-b have you reported the issue somewhere? I'd like to see what it will take to fix it properly upstream, or at the very least, have your patch included.
@marmarek, sorry for the delay! To answer your question, I have unfortunately not reported the issue, at least yet.
In the past two days, I have worked on a (subjectively) more acceptable approach, by changing the patch so that the driver is not aware of Xen at all. The changes involve walking the system's memory map to determine whether the address 0xFE000000
is in the memory map and to determine whether the address refers to system RAM. I have tested the updated patch with pv
, pvh
, and hvm
virtual machines with different amounts of RAM, and also on a non-virtualised host.
All this to say, I hope to publish a patch in a relevant mailing list soon (i.e., possibly next weekend). If you would like to have a copy, I can publish it here ahead of time as well.
@marmarek, I encountered the same issue a few months ago on Qubes OS 4.0, when experimenting with PV-based VMs for a personal reason. I have a workaround in my local git tree, but I never got around to publishing it, mainly due to the workaround not being sufficiently elegant. It looks like you have run into the same issue.
In summary, the issue only occurs when the PV VM has less than 4064 MiB of memory. There is a heuristic in the device driver which attempts to avoid ioremap()'ing the memory mapped registers starting at
PMC_BASE_ADDR_DEFAULT
(0xFE000000
==4064 MiB
) if the addresses correspond to RAM. (I had added this heuristic in the past.) However, if the VM has memory less than4064 MiB
, then the heuristic fails to work as expected, as addresses equal to and above0xFE000000
do not show up as RAM in the memory map.All this to say, I ended up working around this issue locally by applying a patch, which, in my opinion, takes an unacceptable approach, because with this work-around the driver "knows" whether it is operating on a Xen-based system or not, which is not ideal. Here is the patch, but this issue can be temporarily worked around by setting the VM's RAM size to 4 GiB for the time being.
Hope this helps!
commit 8fe194a4c134a9ed24701b510a84c82cbba3285a Author: M. Vefa Bicakci <m.v.b@runbox.com> Date: Fri Apr 24 17:46:55 2020 +0300 intel_pmc_core: Disable for non-dom0 Xen PV domains Prior to this commit, attempting to start a Xen PV domain with less than 0xFE000000 bytes (i.e., 4064 MiB) of memory would cause an unrecoverable page fault as follows, which causes the domain to abruptly die: BUG: unable to handle page fault for address: ffffc9000062b818 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3fbd8067 P4D 3fbd8067 PUD 3fbd7067 PMD 3ce0b067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 407 Comm: systemd-udevd Tainted: G O 5.6.7-1 #1 RIP: e030:pmc_core_probe+0x140/0x300 [intel_pmc_core] Further debugging revealed that ioremap(0xFE000000, ...) works as expected in pmc_core_probe function, and the aforementioned error occurs when the driver attempts to access the memory mapping by calling the pmc_core_check_read_lock_bit() function. Given that the intel_pmc_core driver appears to be incompatible with non-dom0 para-virtualized (PV) Xen domains in general as well (i.e., not only in configurations with less than 4064 MiB of memory), this commit disables the driver in such configurations at run-time. Please note that the driver works as expected in dom0. diff --git a/drivers/platform/x86/intel_pmc_core.c b/drivers/platform/x86/intel_pmc_core.c index 7c8bdab078cf..3cb41381309d 100644 --- a/drivers/platform/x86/intel_pmc_core.c +++ b/drivers/platform/x86/intel_pmc_core.c @@ -23,6 +23,7 @@ #include <linux/slab.h> #include <linux/suspend.h> #include <linux/uaccess.h> +#include <xen/xen.h> #include <asm/cpu_device_id.h> #include <asm/intel-family.h> @@ -1195,6 +1196,13 @@ static int pmc_core_probe(struct platform_device *pdev) const struct x86_cpu_id *cpu_id; u64 slp_s0_addr; +#if IS_ENABLED(CONFIG_XEN_PV) + if (xen_pv_domain() && !xen_initial_domain()) { + dev_info(&pdev->dev, "not compatible with non-dom0 Xen PV domains\n"); + return -ENODEV; + } +#endif + if (device_initialized) return -ENODEV;
do we know if this already approved in upstream ?
Do we actually need to support PV domains? They are a bad idea for all sorts of security reasons.
@bennykusman, as far as I know, this problem has not yet been resolved upstream. Here is a link to the relevant driver code as of today in Linus Torvalds' git tree: Link to git.kernel.org
I have not had the time to publish a revised patch on a relevant mailing list for inclusion in the upstream kernel and to follow up with the code review process. I do not yet know when in the near future I will have some time to work on this; my apologies.
I managed to work around this issue by setting the VM type to PVH (AFAIK the preferred VM mode for Xen). You can do this by adding
type = 'pvh'
to the xl .cfg file, e.g. /etc/xen/vm0.cfg
Hope this helps, Arno
@m-v-b would you mind submitting a patch to our kernel tree?
how about just changing it from y to m in kernel config? (the driver used to be y/n, but is y/n/m these days)
and a VM should never have a reason to load the module...
@DemiMarie, sorry for the late reply. Of course, I can prepare a patch for the QubesOS/qubes-linux-kernel repository. May I learn how urgent this is?
In addition, would the patch quoted in https://github.com/QubesOS/qubes-issues/issues/6052#issuecomment-692360468 be sufficient? I am asking as I have another patch that solves this issue in a more elegant manner by checking the memory map instead of checking for Xen's domain 0, but as mentioned before, I never got around to publishing either patch on a kernel mailing list.
with the driver built as a module, it still tried to load it for whatever reason. after adding a hard blacklisting for the module in the template it worked.
from my pov the config change is a better idea than a patch because of less potential maintenance required in the future. and we are talking about an increasingly small userbase of not-officially-supported-to-begin-with hardware here. (no iommu)
Config change is definitely a good idea. There is no point in even trying to load the module outside of dom0.
@DemiMarie, sorry for the late reply. Of course, I can prepare a patch for the QubesOS/qubes-linux-kernel repository. May I learn how urgent this is?
In addition, would the patch quoted in #6052 (comment) be sufficient? I am asking as I have another patch that solves this issue in a more elegant manner by checking the memory map instead of checking for Xen's domain 0, but as mentioned before, I never got around to publishing either patch on a kernel mailing list.
For Qubes it should be fine, but I suspect upstream will want the more elegant version.
just confirmed a pv vm works with kernel-latest-qubes-vm built after the https://github.com/QubesOS/qubes-linux-kernel/pull/613 merge. test was done with 5.18.16-1.
besides kernel-latest, it also requires blacklisting the module inside the VM. i stored this as /etc/modprobe.d/pmc.conf in the template used for the pv VM:
blacklist intel_pmc_core
install intel_pmc_core /bin/false
(double-blacklist because just the blacklist entry doesn't block modules that are loaded as dependency of non-blacklisted modules. not sure if thats the case here, but it doesn't hurt to block it hard.) this is safe to deploy in the templatevm (== not just for the pv VM) because VMs never really have any use for that module anyways.
unless someone feels the need for more documentation of this workaround for unsupported setups, this issue can be closed at this point?
Closing as resolved. If anyone believes this issue is not yet resolved, or if anyone is still affected by this issue, please leave a comment, and we'll be happy to reopen it. Thank you.
unless someone feels the need for more documentation of this workaround for unsupported setups,
Is PV + specific Intel CPU models considered "unsupported setups?"
this issue can be closed at this point?
There are clear action items (user documentation or a proper fix) left until the issue can be considered resolved, so I suppose we should leave it open.
Is PV + specific Intel CPU models considered "unsupported setups?"
systems without IOMMU (which is the only reason to use PV i am aware of) have been unsupported since qubes 4.0, which means 2018. the installer wil (or at least, should) be showing a clear warning that the system is unsupported before installation.
There are clear action items (user documentation or a proper fix) left until the issue can be considered resolved, so I suppose we should leave it open.
having the workaround documented here in the ticket for advanced users seems sufficient considering it is, again, an unsupported configuration.
To be clear, this does not affect device model domains. Is this correct?
To be clear, this does not affect device model domains. Is this correct?
correct
Qubes OS version Qubes 4.1
Affected component(s) or functionality Xen, Linux
Brief summary Starting a Linux PV domain results in a kernel panic. The same Linux version worked on Xen 4.13.
To Reproduce Steps to reproduce the behavior: 1. Install Qubes 4.1 2. Update to Xen 4.14 3. Try to start a Linux PV domain 4. Observe failed start and the crash message in
/var/log/xen/console/guest-<vmname>.log
Expected behavior Normal start.
Actual behavior
Additional context If relevant, the CPU is Intel Core i7-8750H.
Solutions you've tried Changing kernel version doesn't help (tried 5.4.61, 5.7.10, 5.8.5). But the Linux PV stubdomain does work (it doesn't include the driver that triggers the crash).
Relevant documentation you've consulted A list of links to the Qubes documentation (or other relevant software documentation) pages you have already consulted.
Related, non-duplicate issues A list of links to other bug reports, feature requests, or tasks in the qubes-issues tracker (or "none" if you didn't find any). Do not describe any other unreported bugs, features, or tasks here.