rapenne-s commented 1 year ago

Qubes OS release

4.2-RC3

Brief summary

The amd64boot iso for OpenBSD doesn't boot anymore since I upgraded to 4.2, the same happened for my OpenBSD HVMs that were working fine just before the upgrade.

I'd say it's not OpenBSD fault here because it was working just before the upgrade, and it's still booting fine on any other virtualization software.

7.3 is the last release to date.

Steps to reproduce

Download https://cdn.openbsd.org/pub/OpenBSD/7.3/amd64/cd73.iso
Create an HVM qube
qvm-start OpenBSD --cdrom="web:/home/user/cd73.iso" = boot the OpenBSD qube using cd73.iso downloaded in qube web (adapt the names)
fails with this error

OpenBSD_failure

Expected behavior

it boots

Actual behavior

boot is failing

marmarek commented 1 year ago

Any non-default settings for that qube? (memory size? some qvm-features?)

andyhhp commented 1 year ago

Sorry, not much to go on here. What's trap 4, because it surely isn't #OF? Any way to figure out what the faulting instruction is?

rapenne-s commented 1 year ago

No, I also tried with a new qube created from the GUI in the qube manager with the same result. I think it's related to the CPU (Ryzen 5 5600X).

On another 4.2-RC3 with an i5-7300U this is working fine. :thinking:

andyhhp commented 1 year ago

What Zen uarch is the Ryzen? First thought is maybe the PKS feature

rapenne-s commented 1 year ago

My OpenBSD qube (that is already installed) has a more explicit error

OpenBSD_fail_installed

rapenne-s commented 1 year ago

What Zen uarch is the Ryzen? First thought is maybe the PKS feature

seems to be Zen 3 https://www.amd.com/en/products/cpu/amd-ryzen-5-5600x

andyhhp commented 1 year ago

Huh. So it was a #GP on RDMSR, which is definitely odd

rapenne-s commented 1 year ago

Could you explain a bit what #GP mean? I'm not familiar with those terms.

andyhhp commented 1 year ago

GP is general protection fault. Error code of 0 is "misc" and a dumping ground for everything.

rapenne-s commented 1 year ago

In parallel, I'll report this on OpenBSD mailing list. Could you confirm the HVM are using qemu? (when trying to stop a stuck HVM, the qubes applet provide a way to show qemu logs.. so I guess it's using qemu)

marmarek commented 1 year ago

yup, qemu is running as device model (but based on the crash, it's rather irrelevant here)

andyhhp commented 1 year ago

This will most likely be a Xen issue, but I have to admit I'm a bit confused. %cr4.tsd can cause RDMSR to fault, but that should only effect user mode, not supervisor. Xen can intercept the instruction, but we've not had an emulation error reported here that I'm aware of. Xen does in some cases pass #GP back as a last resort error, but I don't see why that would be the case here.

I wonder if there's a weird TSC mode set for the VM, and rdtsc is generally intercepted. When the VM is crashed, can you run xl debug-keys q then gather xl dmesg from dom0 please

rapenne-s commented 1 year ago

dump.txt

marmarek commented 1 year ago

Can you do the same with xl debug-keys s too ?

andyhhp commented 1 year ago

Sorry, I did mean s. (I'm debugging from a phone in breaks while driving, not in front of my usual dev setup)

andyhhp commented 1 year ago

Right - so I'm an idiot so far. That's an RDMSR failing, not RDTSC.

Which MSR is being used in tsc_identify()?

andyhhp commented 1 year ago

So I expect it is

        def = rdmsr(MSR_PSTATEDEF(0));

blowing up in tsc_freq_msr(). Something that has changed recently is Xen is advertising MSR_HWCR.TSCSFREQEL and we don't advertise pstates at all to guests under any circumstance

DemiMarie commented 1 year ago

@andyhhp does that mean that OpenBSD is peeking at MSRs it has no business looking at, or that Xen is advertising a feature and then injecting #GP when the guest tries to use it?

andyhhp commented 1 year ago

Not sure. I don't have the manuals to hand right now, but I do recall being dubious about the HWCR patch in the first place

DemiMarie commented 1 year ago

HWCR?

andyhhp commented 1 year ago

I'm fairly sure it was broken in Xen 4.15 by https://lore.kernel.org/xen-devel/0c8043e3-07aa-6242-19bd-07b04f574b87@suse.com/, a series committed over my objections concerning the correctness of the changes.

It appears it was to shut up Linux, which makes different and equally dubious model specific assumptions about the availability of certain MSRs.

It is buggy for Linux to declare TSCFREQSEL missing to be a firmware bug - it may legitimately be so due to levelling.
It's buggy for Xen to advertise the bit like that - because it's not levelled and not part of the migration stream.
It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.
And it's probably buggy for Xen to state "TSC counts at the P0 frequency" without giving the P0 frequency, but the jury is still out on this final point because there's no possible way the guest is going to get to see Pstate information.

DemiMarie commented 1 year ago

@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.

rapenne-s commented 1 year ago

@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.

does it mean it's an OpenBSD issue?

DemiMarie commented 1 year ago

@rapenne-s: I think the reason this was missed for so long is that OpenBSD is not used on Xen very much.

does it mean it's an OpenBSD issue?

From @andyhhp (Xen core developer and x86 expert) it appears that this is a combination of a Xen bug and an OpenBSD bug. Fixing either issue should be enough for OpenBSD to boot, but obviously fixing both would be ideal.

mlarkin2015 commented 1 year ago

I'm fairly sure it was broken in Xen 4.15 by https://lore.kernel.org/xen-devel/0c8043e3-07aa-6242-19bd-07b04f574b87@suse.com/, a series committed over my objections concerning the correctness of the changes.

It appears it was to shut up Linux, which makes different and equally dubious model specific assumptions about the availability of certain MSRs.

It is buggy for Linux to declare TSCFREQSEL missing to be a firmware bug - it may legitimately be so due to levelling.

It's buggy for Xen to advertise the bit like that - because it's not levelled and not part of the migration stream.

It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.

And it's probably buggy for Xen to state "TSC counts at the P0 frequency" without giving the P0 frequency, but the jury is still out on this final point because there's no possible way the guest is going to get to see Pstate information.

I don't understand this part:

It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.

From what I can tell, none of the architecure PRMs/SDMs say that "an operating system should not perform model specific checks without checking !hv". There are certain MSRs that are gated behind specific CPUID bits; an OS should certainly be checking those bits before blindly assuming such MSRs exist (I just made a fix for this for HW power management on AMD last week in OpenBSD). But making the broad claim that "if CPUID_HV is set then all model specific checks should not be performed" is a bit odd.

If that bit is set, what should an OS assume? That it's running on an 80386?

Maybe I'm misunderstanding something in that claim; if the statement was intended to say:

"It's buggy for OpenBSD to read an MSR if the CPUID bit describing presence of that MSR is not set" then I'm 100% in agreement. But saying "if HV is set, then no model specific checks should be performed" is a bit odd.

Can you clarify?

DemiMarie commented 1 year ago

I’m not @andyhhp, but my understanding is that “model-specific checks” refers to physical CPU models, not to MSRs. A hypervisor is allowed to pick and choose which features it exposes to guests, so a guest cannot assume that a hypervisor is emulating a specific CPU that actually existed. In particular, a guest cannot assume that support for feature X implies support for feature Y unless the architecture specification says so, even if every processor shipped with feature X has also had feature Y.

In this case, Xen claims to support TSCFREQSEL. OpenBSD seems to assume that if TSCFREQSEL is present, then so are P-states. However, Xen does not expose P-states, so OpenBSD’s attempt to access the P-state MSRs faults. This is an OpenBSD bug: OpenBSD must not assume that support for TSCFREQSEL implies P-state support, even if every physical processor that ever shipped with TSCFREQSEL did in fact have P-state support.

As it happens, Xen should not actually be advertising TSCFREQSEL if live migration is enabled. This doesn’t matter for Qubes OS, though, because Qubes OS does not support live migration and so the limitations of live migration are irrelevant.

andyhhp commented 1 year ago

I don't understand this part:

It's buggy for OpenBSD to perform any model-specific checks without first checking for !hypervisor.

From what I can tell, none of the architecure PRMs/SDMs say that "an operating system should not perform model specific checks without checking !hv". There are certain MSRs that are gated behind specific CPUID bits; an OS should certainly be checking those bits before blindly assuming such MSRs exist (I just made a fix for this for HW power management on AMD last week in OpenBSD). But making the broad claim that "if CPUID_HV is set then all model specific checks should not be performed" is a bit odd.

If that bit is set, what should an OS assume? That it's running on an 80386?

Maybe I'm misunderstanding something in that claim; if the statement was intended to say:

"It's buggy for OpenBSD to read an MSR if the CPUID bit describing presence of that MSR is not set" then I'm 100% in agreement. But saying "if HV is set, then no model specific checks should be performed" is a bit odd.

Can you clarify?

Some MSRs are architectural; you can rely on CPUID bits declaring their presence. In the case of MSR_HWCR, you can rely on the fact that you're on AMD and Long Mode is available. Architectural MSRs are described in the APM, Vol2 System Programming.

Some MSRs, including MSR_HWCR sadly, have a mix of architectural and model specific bits. The TSC_FREQ_SEL is excluded from the list of architectural bits in APM Vol2 3.2.10 Hardware Configuration Register (HWCR), so it is model specific. i.e. the meaning of that bit is not (or might not be) the same across different AMD CPUs.

The C/Pstate control MSRs are entirely model specific.

And this is the problem. When virtualised, you get given a Family/Model/Stepping at boot, but as you migrate around, you will end up on different hardware, and the meaning of those bits do change. Also, as you migrate around, the P0 frequency will change, which is why VMs specifically don't get given that info in the first place (in the general case - if you're willing to never migrate then it's safe to pass through more hardware details).

Under virt, if you're using the Family/Model/Stepping for anything more than diagnostic information, you're in for an unwelcome surprise when migrating.

mlarkin2015 commented 1 year ago

Some MSRs are architectural; you can rely on CPUID bits declaring their presence. In the case of MSR_HWCR, you can rely on the fact that you're on AMD and Long Mode is available. Architectural MSRs are described in the APM, Vol2 System Programming.

Some MSRs, including MSR_HWCR sadly, have a mix of architectural and model specific bits. The TSC_FREQ_SEL is excluded from the list of architectural bits in APM Vol2 3.2.10 Hardware Configuration Register (HWCR), so it is model specific. i.e. the meaning of that bit is not (or might not be) the same across different AMD CPUs.

The C/Pstate control MSRs are entirely model specific.

And this is the problem. When virtualised, you get given a Family/Model/Stepping at boot, but as you migrate around, you will end up on different hardware, and the meaning of those bits do change. Also, as you migrate around, the P0 frequency will change, which is why VMs specifically don't get given that info in the first place (in the general case - if you're willing to never migrate then it's safe to pass through more hardware details).

Under virt, if you're using the Family/Model/Stepping for anything more than diagnostic information, you're in for an unwelcome surprise when migrating.

Thanks Demi and Andrew for the clarifications. When viewed from the perspective of migration, then this makes a lot more sense, and I agree with the explanations given. I'll take a look where OpenBSD is making these assumptions and see if I can clean those up.

andyhhp commented 1 year ago

@marmarek Fix has gone into upstream Xen: https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e4ca4e261da3fdddd541c3a9842b1e9e2ad00525

rapenne-s commented 1 year ago

I updated my dom0 and OpenBSD starts again. Thanks everyone who got involved to solve this!

QubesOS / qubes-issues

4.2-RC3: OpenBSD 7.3 ISO doesn't boot anymore #8502

Qubes OS release

Brief summary

Steps to reproduce

Expected behavior

Actual behavior

GP is general protection fault. Error code of 0 is "misc" and a dumping ground for everything.