Closed rapenne-s closed 1 year ago
Any non-default settings for that qube? (memory size? some qvm-features?)
Sorry, not much to go on here. What's trap 4, because it surely isn't #OF? Any way to figure out what the faulting instruction is?
No, I also tried with a new qube created from the GUI in the qube manager with the same result. I think it's related to the CPU (Ryzen 5 5600X).
On another 4.2-RC3 with an i5-7300U this is working fine. :thinking:
What Zen uarch is the Ryzen? First thought is maybe the PKS feature
My OpenBSD qube (that is already installed) has a more explicit error
What Zen uarch is the Ryzen? First thought is maybe the PKS feature
seems to be Zen 3 https://www.amd.com/en/products/cpu/amd-ryzen-5-5600x
Huh. So it was a #GP on RDMSR, which is definitely odd
Could you explain a bit what #GP mean? I'm not familiar with those terms.
In parallel, I'll report this on OpenBSD mailing list. Could you confirm the HVM are using qemu? (when trying to stop a stuck HVM, the qubes applet provide a way to show qemu logs.. so I guess it's using qemu)
yup, qemu is running as device model (but based on the crash, it's rather irrelevant here)
This will most likely be a Xen issue, but I have to admit I'm a bit confused. %cr4.tsd
can cause RDMSR to fault, but that should only effect user mode, not supervisor. Xen can intercept the instruction, but we've not had an emulation error reported here that I'm aware of. Xen does in some cases pass #GP back as a last resort error, but I don't see why that would be the case here.
I wonder if there's a weird TSC mode set for the VM, and rdtsc is generally intercepted. When the VM is crashed, can you run xl debug-keys q
then gather xl dmesg
from dom0 please
Can you do the same with xl debug-keys s
too ?
Sorry, I did mean s. (I'm debugging from a phone in breaks while driving, not in front of my usual dev setup)
Right - so I'm an idiot so far. That's an RDMSR failing, not RDTSC.
Which MSR is being used in tsc_identify()
?
So I expect it is
def = rdmsr(MSR_PSTATEDEF(0));
blowing up in tsc_freq_msr()
.
Something that has changed recently is Xen is advertising MSR_HWCR.TSCSFREQEL
and we don't advertise pstates at all to guests under any circumstance
@andyhhp does that mean that OpenBSD is peeking at MSRs it has no business looking at, or that Xen is advertising a feature and then injecting #GP when the guest tries to use it?
Not sure. I don't have the manuals to hand right now, but I do recall being dubious about the HWCR patch in the first place
HWCR?
I'm fairly sure it was broken in Xen 4.15 by https://lore.kernel.org/xen-devel/0c8043e3-07aa-6242-19bd-07b04f574b87@suse.com/, a series committed over my objections concerning the correctness of the changes.
It appears it was to shut up Linux, which makes different and equally dubious model specific assumptions about the availability of certain MSRs.
@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.
@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.
does it mean it's an OpenBSD issue?
@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.
does it mean it's an OpenBSD issue?
From @andyhhp (Xen core developer and x86 expert) it appears that this is a combination of a Xen bug and an OpenBSD bug. Fixing either issue should be enough for OpenBSD to boot, but obviously fixing both would be ideal.
I'm fairly sure it was broken in Xen 4.15 by https://lore.kernel.org/xen-devel/0c8043e3-07aa-6242-19bd-07b04f574b87@suse.com/, a series committed over my objections concerning the correctness of the changes.
It appears it was to shut up Linux, which makes different and equally dubious model specific assumptions about the availability of certain MSRs.
- It is buggy for Linux to declare TSCFREQSEL missing to be a firmware bug - it may legitimately be so due to levelling.
- It's buggy for Xen to advertise the bit like that - because it's not levelled and not part of the migration stream.
- It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.
- And it's probably buggy for Xen to state "TSC counts at the P0 frequency" without giving the P0 frequency, but the jury is still out on this final point because there's no possible way the guest is going to get to see Pstate information.
I don't understand this part:
- It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.
From what I can tell, none of the architecure PRMs/SDMs say that "an operating system should not perform model specific checks without checking !hv". There are certain MSRs that are gated behind specific CPUID bits; an OS should certainly be checking those bits before blindly assuming such MSRs exist (I just made a fix for this for HW power management on AMD last week in OpenBSD). But making the broad claim that "if CPUID_HV is set then all model specific checks should not be performed" is a bit odd.
If that bit is set, what should an OS assume? That it's running on an 80386?
Maybe I'm misunderstanding something in that claim; if the statement was intended to say:
"It's buggy for OpenBSD to read an MSR if the CPUID bit describing presence of that MSR is not set" then I'm 100% in agreement. But saying "if HV is set, then no model specific checks should be performed" is a bit odd.
Can you clarify?
I’m not @andyhhp, but my understanding is that “model-specific checks” refers to physical CPU models, not to MSRs. A hypervisor is allowed to pick and choose which features it exposes to guests, so a guest cannot assume that a hypervisor is emulating a specific CPU that actually existed. In particular, a guest cannot assume that support for feature X implies support for feature Y unless the architecture specification says so, even if every processor shipped with feature X has also had feature Y.
In this case, Xen claims to support TSCFREQSEL. OpenBSD seems to assume that if TSCFREQSEL is present, then so are P-states. However, Xen does not expose P-states, so OpenBSD’s attempt to access the P-state MSRs faults. This is an OpenBSD bug: OpenBSD must not assume that support for TSCFREQSEL implies P-state support, even if every physical processor that ever shipped with TSCFREQSEL did in fact have P-state support.
As it happens, Xen should not actually be advertising TSCFREQSEL if live migration is enabled. This doesn’t matter for Qubes OS, though, because Qubes OS does not support live migration and so the limitations of live migration are irrelevant.
I don't understand this part:
- It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.
From what I can tell, none of the architecure PRMs/SDMs say that "an operating system should not perform model specific checks without checking !hv". There are certain MSRs that are gated behind specific CPUID bits; an OS should certainly be checking those bits before blindly assuming such MSRs exist (I just made a fix for this for HW power management on AMD last week in OpenBSD). But making the broad claim that "if CPUID_HV is set then all model specific checks should not be performed" is a bit odd.
If that bit is set, what should an OS assume? That it's running on an 80386?
Maybe I'm misunderstanding something in that claim; if the statement was intended to say:
"It's buggy for OpenBSD to read an MSR if the CPUID bit describing presence of that MSR is not set" then I'm 100% in agreement. But saying "if HV is set, then no model specific checks should be performed" is a bit odd.
Can you clarify?
Some MSRs are architectural; you can rely on CPUID bits declaring their presence. In the case of MSR_HWCR, you can rely on the fact that you're on AMD and Long Mode is available. Architectural MSRs are described in the APM, Vol2 System Programming.
Some MSRs, including MSR_HWCR sadly, have a mix of architectural and model specific bits. The TSC_FREQ_SEL is excluded from the list of architectural bits in APM Vol2 3.2.10 Hardware Configuration Register (HWCR), so it is model specific. i.e. the meaning of that bit is not (or might not be) the same across different AMD CPUs.
The C/Pstate control MSRs are entirely model specific.
And this is the problem. When virtualised, you get given a Family/Model/Stepping at boot, but as you migrate around, you will end up on different hardware, and the meaning of those bits do change. Also, as you migrate around, the P0 frequency will change, which is why VMs specifically don't get given that info in the first place (in the general case - if you're willing to never migrate then it's safe to pass through more hardware details).
Under virt, if you're using the Family/Model/Stepping for anything more than diagnostic information, you're in for an unwelcome surprise when migrating.
Some MSRs are architectural; you can rely on CPUID bits declaring their presence. In the case of MSR_HWCR, you can rely on the fact that you're on AMD and Long Mode is available. Architectural MSRs are described in the APM, Vol2 System Programming.
Some MSRs, including MSR_HWCR sadly, have a mix of architectural and model specific bits. The TSC_FREQ_SEL is excluded from the list of architectural bits in APM Vol2 3.2.10 Hardware Configuration Register (HWCR), so it is model specific. i.e. the meaning of that bit is not (or might not be) the same across different AMD CPUs.
The C/Pstate control MSRs are entirely model specific.
And this is the problem. When virtualised, you get given a Family/Model/Stepping at boot, but as you migrate around, you will end up on different hardware, and the meaning of those bits do change. Also, as you migrate around, the P0 frequency will change, which is why VMs specifically don't get given that info in the first place (in the general case - if you're willing to never migrate then it's safe to pass through more hardware details).
Under virt, if you're using the Family/Model/Stepping for anything more than diagnostic information, you're in for an unwelcome surprise when migrating.
Thanks Demi and Andrew for the clarifications. When viewed from the perspective of migration, then this makes a lot more sense, and I agree with the explanations given. I'll take a look where OpenBSD is making these assumptions and see if I can clean those up.
@marmarek Fix has gone into upstream Xen: https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e4ca4e261da3fdddd541c3a9842b1e9e2ad00525
I updated my dom0 and OpenBSD starts again. Thanks everyone who got involved to solve this!
How to file a helpful issue
Qubes OS release
4.2-RC3
Brief summary
The amd64boot iso for OpenBSD doesn't boot anymore since I upgraded to 4.2, the same happened for my OpenBSD HVMs that were working fine just before the upgrade.
I'd say it's not OpenBSD fault here because it was working just before the upgrade, and it's still booting fine on any other virtualization software.
7.3 is the last release to date.
Steps to reproduce
qvm-start OpenBSD --cdrom="web:/home/user/cd73.iso"
= boot the OpenBSD qube using cd73.iso downloaded in qube web (adapt the names)Expected behavior
it boots
Actual behavior
boot is failing