Shutdown sometimes reboots the system on Z790

renehoj commented 1 year ago

Dasharo version v0.9.0

Dasharo variant Workstation

Affected component(s) or functionality Shutdown

Brief summary Sometimes shutdown will reboot the system. I'm using Qubes 4.2.0, and there is no warnings in errors in journalctl. I have not experienced this issues with the Z690, running the same version of Qubes.

How reproducible It's happened two times now, but I don't know what triggers it.

How to reproduce Don't know how to reproduce.

Expected behavior The system would turn off, and not restart.

Actual behavior System restarts.

miczyg1 commented 1 year ago

@renehoj, please get the cbmem log from when the machine restarts instead of shutdown. There should be some wake causes in there for us to analyze

renehoj commented 1 year ago

It happened again today, here are the cbmem logs.

logs.tar.gz

miczyg1 commented 1 year ago

Hmm the wake flag is not set, so it looks like an ordinary reboot. However the register which OS writes to put the system to sleep is set to the value that indicates power off.

We will need to reproduce this issue using QubesOS to get more detailed view on this matter.

renehoj commented 1 year ago

I think the system doesn't just restart, it crashes during shutdown, and it's the crash that reboots the system.

I had been ignoring the problem, because it isn't a huge issue to shut down the system manually when it rebooted, but this weekend it left my system unable to boot because the file system was corrupted. I have looked at the system journal and the reboot seems to happen early in the shut down sequence, before the drives are dismounted.

Have you been able to reproduce the issue?

miczyg1 commented 1 year ago

We haven't experienced such issues yet.

renehoj commented 12 months ago

Is there any difference between the Z690 and Z790 in regards to how the watchdog timer works?

I tried disabling the watchdog in the chipset configuration and the issue seems to have gone away, but I don't have any way to trigger the crash/reset to confirm it actually solves the issue.

Is there any way the watchdog event could fire during shutdown?

There is a period (maybe 3-5 sec.) during shutdown where the system stalls, right after the "system is powering down now" message, it's always at this point the crash/reset happens, if it happens.

renehoj commented 12 months ago

I did some more testing, and disabling watchdog doesn't solve the issue. It just allows me to stop more VMs during shutdown, without rebooting the system.

Having ~15 VMs running during shutdown will still reboot the system, but manually shutting down all VMs before doing the system shutdown seems to work just fine.

miczyg1 commented 12 months ago

Is there any difference between the Z690 and Z790 in regards to how the watchdog timer works?

There isn't, at least from the firmware side.

Is there any way the watchdog event could fire during shutdown?

Rather unlikely, the watchdog is being reloaded with an SMI handler, which will still run right before the shutdown.

There is a period (maybe 3-5 sec.) during shutdown where the system stalls, right after the "system is powering down now" message, it's always at this point the crash/reset happens, if it happens.

system is powering down now is not the last message displayed before the poweroff. IIRC it should be reboot: power down. Maybe a serial console would say something more?

Having ~15 VMs running during shutdown will still reboot the system, but manually shutting down all VMs before doing the system shutdown seems to work just fine.

Interesting... Maybe @marmarek would have some ideas?

renehoj commented 11 months ago

System powering down is the last message if the system crash/reset, after that message dom0/xen will start shutting down VMs.

I guess if no one else is experiencing the same issue, it's unlikely to be caused by Dasharo.

I did upgrade from 12th gen Z690 DDR4 to 13th gen Z790 DDR5, if I'm the only one with the problem it seems more likely to be an issue with xen and the new hardware.

How do I connect a serial console?, would that require connecting a TTL to USB cable to the motherboard?

marmarek commented 11 months ago

after that message dom0/xen will start shutting down VMs.

VMs are normally shutdown when stopping "qubes-core" (aka "Qubes Dom0 startup setup") service.

How do I connect a serial console?, would that require connecting a TTL to USB cable to the motherboard?

https://docs.dasharo.com/unified/msi/development/#serial-debug

renehoj commented 11 months ago

Here is the output from the crash

[  121.025419] dm-110: detected capacity change from 41943040 to 0

(XEN) ----[ Xen-4.17.2  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82d04022e739>] xmem_pool_free+0x229/0x2fe
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v3)
(XEN) rax: ffff830dbf077db0   rbx: ffff830dbf077d50   rcx: 00000000481a1fd0
(XEN) rdx: 0000000000000004   rsi: 0000000000000000   rdi: ffff830da4e2dfd0
(XEN) rbp: ffff831081dc0000   rsp: ffff8310797b7e08   r8:  0000000000000005
(XEN) r9:  ffff830e47424000   r10: 0000000000000000   r11: ffff8310354e30ac
(XEN) r12: ffff831081dc1868   r13: fffffffffffffff8   r14: ffff830e481acaa8
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000b526e0
(XEN) cr3: 0000001072c22000   cr2: 00000000481a1fe8
(XEN) fsb: 0000000000000000   gsb: ffff8881b9d80000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d04022e739> (xmem_pool_free+0x229/0x2fe):
(XEN)  85 c9 74 08 48 8b 78 18 <48> 89 79 18 48 63 fe 48 63 ca 49 89 f8 49 c1 e0
(XEN) Xen stack trace from rsp=ffff8310797b7e08:
(XEN)    ffff830e2f04b540 00000000ffffffff ffff830e481ac000 ffff82d0402c7392
(XEN)    ffff830e2f04b540 ffff82d0402cd369 ffff830e481ac000 ffff82d0402cd90e
(XEN)    ffff830e481ac000 ffff82d0402e6edb ffff830e47446000 ffff82d040205fa0
(XEN)    ffff8310797bd040 0000000000000000 0000000000000002 ffff82d0405c5d80
(XEN)    0000000000000003 ffff82d040224838 ffff82d0405bdd80 0000000000000000
(XEN)    ffffffffffffffff ffff82d0405bdd80 0000000000000000 ffff82d040224f2a
(XEN)    ffff831079530000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff82d0402e2ef6 0000000000000000 0000000000000000
(XEN)    ffff8881064ac2b8 0000000080000007 ffff8881b9d9f300 ffff8881b9d9f300
(XEN)    0000000000000206 0000000001ffff00 0000000000000015 0000000000000001
(XEN)    000000000000000d ffffffff81f871aa 0000000000000000 0000000000000005
(XEN)    ffff8881b9d9f710 0000010000000000 ffffffff81f871a8 000000000000e033
(XEN)    0000000000000206 ffffc90042e97cd0 000000000000e02b bcb4157c93368e52
(XEN)    70b70abaa2124c0c f804e0e31be5a13b 197a535927b0d472 0000e01000000003
(XEN)    ffff831079530000 0000004039205000 0000000000b526e0 0000000000000000
(XEN)    0000000000000000 2106030300000001 9549274c040d2000
(XEN) Xen call trace:
(XEN)    [<ffff82d04022e739>] R xmem_pool_free+0x229/0x2fe
(XEN)    [<ffff82d0402c7392>] S p2m_free_logdirty+0x12/0x1c
(XEN)    [<ffff82d0402cd369>] S p2m_free_one+0x9/0x4a
(XEN)    [<ffff82d0402cd90e>] S p2m_final_teardown+0x2e/0x45
(XEN)    [<ffff82d0402e6edb>] S arch_domain_destroy+0x5b/0xa5
(XEN)    [<ffff82d040205fa0>] S domain.c#complete_domain_destroy+0x80/0x13e
(XEN)    [<ffff82d040224838>] S rcupdate.c#rcu_process_callbacks+0x118/0x29b
(XEN)    [<ffff82d040224f2a>] S softirq.c#.annobin_softirq.c+0x5a/0x91
(XEN)    [<ffff82d0402e2ef6>] S x86_64/entry.S#process_softirqs+0x6/0x20
(XEN) 
(XEN) Pagetable walk from 00000000481a1fe8:
(XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: 00000000481a1fe8
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

andyhhp commented 11 months ago

Ok, so there are multiple issues here.

@renehoj when posting logs, please use code tags (triple `) which will prevent github from trying to interpret things like < and > as link syntax and omit them when rendered. That said, there is enough information here to work with.

First, there's clearly some corruption in Xen causing it to take a fatal pagefault. It's a userspace pointer, which is wrong in this context. My gut feeling is that there's a pointer in memory which has been corrupted somehow, and 00000000481a1fe8 is incidental, rather than the cause.

For this, please can you rebuild Xen with CONFIG_DEBUG enabled (generally, in case there are assertions elsewhere which help narrow things down), and CONFIG_XMEM_POOL_POISON specifically, which performs extra integrity checking of xmem pools in order to spot things like double frees. Both options are available in Xen's menuconfig.

Second, Xen is trying to reboot and clearly not succeeding. To diagnose that further, we'd need the full xl dmesg log from boot, and anything else that may have been on the serial at the point of crash (Again, a debug Xen might help further here).

zirblazer commented 11 months ago

Do you believe than there is any possibility of a defective RAM module that causes memory corruption? Put full system configuration, just in case you end up going in the MemTest route in case that it ends up being Hardware.

renehoj commented 11 months ago

@zirblazer I have tried running the memtest, it didn't find any errors. The system is stable I don't get random crashes, as fare as I can tell it's only shutting down QubesOS with the VMs running that triggers the reset.

@andyhhp I have updated the output in the post. I can try and build a new QubesOS iso, but I don't have any experience with compiling Xen or using the Qubes builder, so I don't know how easy it is to build QubesOS with a custom version of Xen.

I don't think there is a problem with the reboot, I just didn't include the output from the next boot. The system reboots, and everything seems to be working.

As far as I can tell the only way I can trigger the crash is by shutting down QubesOS with the VMs running, I have not seen any reboots from manually shutting down VMs. This is why I initially though it was an issues with Dasharo, it looked like shutdown would randomly reboot the system.

andyhhp commented 11 months ago

I have updated the output in the post.

Thanks. That's a much more normal looking backtrace now.

@marmarek Can you advise on the best way to run a debug Xen under QubesOS ? If it were just me, I'd drop a new xen.gz into /boot but I have no idea if that's the right course of actions here. A casual glance at the Developer docs doesn't seem to reveal anything obvious.

@renehoj This is definitely (at least partially) some error in Xen. Even if it is memory corruption from something else, it will still require a fix in Xen to mitigate.

Can you confirm exactly what steps you are taking in order to trigger the crash? You distinguish between manually shutting down the VMs (so initiating a shutdown inside the VM?), and shutting down ones which are running (an admin operation to kill the Qube?), but it's not entirely clear.

If reboot following the crash is actually working fine, then this is unlikely to be a Dasharo issue.

marmarek commented 11 months ago

If it were just me, I'd drop a new xen.gz into /boot but I have no idea if that's the right course of actions here.

That's what I usually do... We have several patches on top but most of them are just for toolstack and the few for hypervisor shouldn't matter for debugging. Just make sure you use appropriate branch (stable-4.17) and config (https://github.com/QubesOS/qubes-vmm-xen/blob/main/config).

renehoj commented 11 months ago

@andyhhp To trigger the crash, I leave the VMs running and shutdown QubesOS from the XFCE menu. I don't know of any other way to trigger the crash.

When I say manually shutting down the VMs, I mean running qvm-shutdown --all --wait, so when I shut down the system no VMs are running.

I'm just one person taking mental notes of when the system crashes, I can't say for sure qvm-shutdown is safe to use, but I have only seen the crash happen during the system shutdown with the VMs running.

Tonight I will try and compile stable-4.17 with the QubesOS config and debugging enabled.

andyhhp commented 11 months ago

To trigger the crash, I leave the VMs running and shutdown QubesOS from the XFCE menu.

Ah ok. So when you explicitly shut the VMs down first, then shut QubesOS down, it works fine. When you try to shut QubesOS down with VMs still running, we hit the crash.

It looks like there's something else happening in the shutdown case which doesn't work nicely in parallel with VM shutdown.

gwd commented 11 months ago

The p2m_free_logdirty may indicate that the domain in question was "saved", rather than shut down. @marmarek , will Qubes try to save currently-running domains when shutting down? @renehoj, can you try saving the domains rather than shutting them down, to see if that triggers anything? Not sure the best way to do that w/in Qubes.

marmarek commented 11 months ago

@marmarek , will Qubes try to save currently-running domains when shutting down?

No.

andyhhp commented 11 months ago

The p2m_free_logdirty may indicate

It's "S" which means it may be stack rubble in a release build, including a token entry finding nothing to do. I'd suggest waiting for the log from a debug build which should give us this accurately.

renehoj commented 11 months ago

@andyhhp I tried to compile xen with debug, but the output looks very similar to the previous crash.

I cloned 4.17 copied the config marmarek linked to xen/xen and renamed it to .config enabled CONFIG_DEBUG and CONFIG_XMEM_POOL_POISON ran make build-xen copied the xen.gz to /boot in dom0, and renamed to it to xen-4.17.2.gz

Is there anything else I need to do?

(XEN) ----[ Xen-4.17.2  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d04023a355>] credit2.c#csched2_free_domdata+0x65/0xd0
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d0v0)
(XEN) rax: ffff830e12f69fa8   rbx: ffff830e12f69f40   rcx: ffff83107a06ac70
(XEN) rdx: 00000000bf22efc8   rsi: 000000000000001f   rdi: 0000000000000001
(XEN) rbp: ffff82d040588038   rsp: ffff8310828b7cc0   r8:  0000000000000000
(XEN) r9:  ffff83107a06a200   r10: ffff83108288d340   r11: ffff830fbff45000
(XEN) r12: ffff83107a06ac20   r13: 0000000000000286   r14: 0000000000000000
(XEN) r15: ffff83107a3fec00   cr0: 0000000080050033   cr4: 0000000000b526e0
(XEN) cr3: 000000091e5de000   cr2: 00000000bf22efc8
(XEN) fsb: 00007977b0ff96c0   gsb: ffff8881b9c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d04023a355> (credit2.c#csched2_free_domdata+0x65/0xd0):
(XEN)  48 8d 43 68 48 89 51 08 <48> 89 0a 48 89 43 68 48 89 43 70 f0 41 81 24 24
(XEN) Xen stack trace from rsp=ffff8310828b7cc0:
(XEN)    ffff830e12e60920 0000000000000000 0000000000000000 ffff83107a06e8f0
(XEN)    ffff82d040245f7e ffff830e12f69f40 ffff83107a06ab10 ffff830e12e3af60
(XEN)    ffff83107a06acd0 ffff830fbff45000 ffff830e12f69f40 ffff830e12e3af60
(XEN)    ffff830fbff45000 ffff83107a06e8f0 ffff8310a73e4010 ffff83107a06e8f0
(XEN)    ffff830fbff45000 00007977eb7f3010 ffff82d04043d5f0 ffff8310828b7e10
(XEN)    0000000000000000 ffff82d040235c86 ffff830fbff45000 ffff8310828b7ec8
(XEN)    00007977eb7f3010 0000000000000000 ffff82d0402069e6 ffff830fbff45000
(XEN)    ffff8310828b7ec8 ffff82d040232c9b ffff83107a01cb80 ffff83107a01cb80
(XEN)    ffff82d04020c44e ffff831000000001 ffff83107a04b001 0000000000000001
(XEN)    ffff83107a04b000 ffff83107a04b068 ffff82d040588280 ffff82d040588280
(XEN)    ffff82d0405880e0 0000000000000206 0000001500000002 7aaa37872bc7000e
(XEN)    00007977a002a8e0 ffffffffffffff78 0000000000000000 00007977b0ff9638
(XEN)    00007977d85566a0 7aaa37872bc71200 00007977a002a8e0 7aaa37872bc71200
(XEN)    0000000000000000 ffffffffffffff78 0000000000000000 0000797774008000
(XEN)    00007977b0ff8280 000000000000000e 00007977a002a8e0 00007977eaf8d3d3
(XEN)    ffff8310828b7ef8 ffff83107a04b000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000024 ffff82d0402dfc92 ffff83107a04b000
(XEN)    0000000000000000 0000000000000000 ffff82d0402012a7 0000000000000000
(XEN)    ffff88811c6bb700 00007977b0ff8110 0000000000305000 ffff88811c6bb700
(XEN)    0000000000000022 0000000000000282 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d04023a355>] R credit2.c#csched2_free_domdata+0x65/0xd0
(XEN)    [<ffff82d040245f7e>] S sched_move_domain+0x1ee/0x760
(XEN)    [<ffff82d040235c86>] S cpupool_move_domain+0x36/0x70
(XEN)    [<ffff82d0402069e6>] S domain_kill+0x96/0x110
(XEN)    [<ffff82d040232c9b>] S do_domctl+0x123b/0x18e0
(XEN)    [<ffff82d04020c44e>] S event_fifo.c#evtchn_fifo_set_pending+0x34e/0x560
(XEN)    [<ffff82d0402dfc92>] S pv_hypercall+0x3e2/0x410
(XEN)    [<ffff82d0402012a7>] S lstar_enter+0x137/0x140
(XEN) 
(XEN) Pagetable walk from 00000000bf22efc8:
(XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: 00000000bf22efc8
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

andyhhp commented 11 months ago

(XEN) ----[ Xen-4.17.2  x86_64  debug=n  Not tainted ]----

That's still a release build of Xen, so I suspect you didn't actually boot the hypervisor you built. Either edit grub.cfg, or use the interactive prompt at boot to change the path to the hypervisor.

That said,

(XEN) RIP:    e008:[<ffff82d04023a355>] credit2.c#csched2_free_domdata+0x65/0xd0
...
(XEN) Pagetable walk from 00000000bf22efc8:

is a very different area of code, but with similar symptoms that still look like xmem pool corruption. I suspect that if you repeat this a few times, you'll find different RIPs each time.

renehoj commented 11 months ago

@andyhhp The version I have compiled can't shutdown any VMs without crashing, shutting down QubesOS or a single VM will result in the following crash.

Don't know if it relevant, but I'm running 4 cpupools with credit2 as the scheduler.

(XEN) arch/x86/hvm/hvm.c:1658:d8v0 All CPUs offline -- powering off.
(XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
(XEN) ----[ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d04023a700>] credit2.c#csched2_unit_remove+0xe3/0xe7
(XEN) RFLAGS: 0000000000010083   CONTEXT: hypervisor (d0v1)
(XEN) rax: ffff83107a06ac90   rbx: ffff830bd4124ef0   rcx: 00000000000072f5
(XEN) rdx: 0000004039bc5000   rsi: ffff830bd4124ef0   rdi: ffff830bd41240f0
(XEN) rbp: ffff82d0405a9288   rsp: ffff831082887cc8   r8:  0000000000000000
(XEN) r9:  ffff83107a06aaa0   r10: ffff82e000000000   r11: 4000000000000000
(XEN) r12: ffff83107a06ac90   r13: ffff830bd41240f0   r14: ffff82d0405c15e0
(XEN) r15: ffff82d0405a9288   cr0: 0000000080050033   cr4: 0000000000b526e0
(XEN) cr3: 00000008d5938000   cr2: ffff8881399d4a78
(XEN) fsb: 00007f36e57fa6c0   gsb: ffff8881b9c80000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d04023a700> (credit2.c#csched2_unit_remove+0xe3/0xe7):
(XEN)  bb fe ff e9 5f ff ff ff <0f> 0b 0f 0b 41 54 55 53 48 8b 47 08 48 8b 50 20
(XEN) Xen stack trace from rsp=ffff831082887cc8:
(XEN)    ffff830bd4124ef0 ffff830bf8b95000 ffff83107a0694b0 ffff83107a06e8c0
(XEN)    ffff82d0405c15e0 ffff82d040246adb 0000000100000000 ffff830bf8a7cf60
(XEN)    ffff82d000000010 ffff83107a06aad0 ffff82d0405a9288 ffff830bd40fef20
(XEN)    ffff830bf8a7cdb0 ffff83107a06e8c0 ffff830bf8b95000 ffff82d04045d5e8
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d040234cf7
(XEN)    ffff830bf8b95000 ffff83107a06e8c0 ffff82d040236025 ffff830bf8b95000
(XEN)    0000000000000000 00007f37211f3010 ffff82d040206513 ffff830bf8b95000
(XEN)    ffff831082887ec8 00007f37211f3010 ffff82d040232b12 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000700000001 ffff82c000000000
(XEN)    ffff83107a0531c0 ffff82d0402276ba ffff83107a01c068 0000000000000286
(XEN)    0000001a7571dfb0 0000001500000002 9ae1a81629050008 00007f36c40274e0
(XEN)    ffffffffffffff78 0000000000000002 00007f36e57fa638 00007f37107746a0
(XEN)    9ae1a8162905f100 00007f36c40274e0 9ae1a8162905f100 0000000000000002
(XEN)    ffffffffffffff78 0000000000000002 00007f36b80f8470 00007f36e57f9280
(XEN)    0000000000000008 00007f36c40274e0 00007f37209ad3d3 ffff831082887ef8
(XEN)    ffff83107a01c000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000024 ffff82d0402dc71b ffff83107a01c000 0000000000000000
(XEN)    0000000000000000 ffff82d0402012b7 0000000000000000 ffff888165f37200
(XEN)    00007f36e57f9110 0000000000305000 ffff888165f37200 0000000000000022
(XEN)    0000000000000282 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
(XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
(XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
(XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
(XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
(XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
(XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
(XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
(XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

andyhhp commented 11 months ago

Well - that is an improvement... at least it's consistent now.

And yes, CPUPools almost certainly is relevant factor here. Can you describe your CPUPool setup, and other scheduling settings (e.g. smt, sched_gran), and which VMs are in which CPUPools.

One observation - it does seem fairly absurd that we're moving vcpus between CPUPools as part of destroying the VM, but we really do move it back from whichever CPUPool it's in, back to pool 0.

andyhhp commented 11 months ago

Ah - one further question. This is a Z790, with an AlderLake CPU, is it not? ADL is the first of the Hybrid CPUs which mix P and E cores, and in particular which might be relevant here, have different hyperthreading-ness

renehoj commented 11 months ago

@andyhhp

It is an 13th gen RaptorLake, but it has the same P and E cores as AlderLake.

I have smt=on, dom0_max_vcpus=4, dom0_vcpus_pin, sched-gran=cpu. I tried using sched-gran=core, but it doesn't seem possible for asymmetric cpus.

CPUPools Pool-0: core 0-3 Browsers: core 4-9 PCores: core 10-15 ECores: core 16-31

I have a python script running on dom0 that listen for QubeOS admin events, when adminAPI says a VM has started the script executes xl cpupool-migarte to move the VM to the pool where it belongs, this is mostly to control what is running on P and E cores.

andyhhp commented 11 months ago

Wow, a lot to unpack there. That's quite a setup.

First, there is a list_del() hidden behind the ASSERT() which is now reliably failing, and that could plausibly be our source of memory corruption. Either way, lets investigate the reliable issue first.

AFAICT, you've got P cores (ID 0 thru 15, so 8c/16t), and E cores (ID 16 thru 31, so only 8c total), with SMT active and "cpu" (thread) scheduling. So Xen should be fairly oblivious and just be treating them as 24 independent things.

Can you provide the full xl dmesg, including activating CONFIG_DEBUG_TRACE and initialising debugtrace_send_to_console = true; near the top of xen/common/debugtrace.c. That should cause relevant messages to come out in sync with other printk()s.

Are all cpupools running credit2? Can you experiment using credit across the board? You'll need sched=credit on Xen's command like to change cpupool0. Credit(1) is less concerned about CPU topology compared to credit2, and this may help narrow down if it's something specific to Credit2, or CPUPools.

As to the admin events, xl can start the VM in a non-default CPUPool. When we've sorted this crash out, it is probably worth seeing if libvirt has libxl's cpupool controls exposed in a nice way, because that would avoid needing to play with the VM after the fact.

jgross1 commented 11 months ago

There is definitely a flaw in the move domain design, at least with credit2. I guess in your case credit2 is using multiple runqueues in at least one cpupool.

I have a rough idea how to fix it, but this might take some time. In the meantime I'd like you to verify my suspect by posting the output of "xl dmesg" after having setup the cpupools.

To get your system back to a working state you might want to try adding "credit2_runqueue=all" to the Xen boot parameters (assuming my analysis is correct). Please do so only after gathering the xl dmesg data I've asked for.

renehoj commented 11 months ago

@andyhhp

AFAICT, you've got P cores (ID 0 thru 15, so 8c/16t), and E cores (ID 16 thru 31, so only 8c total), with SMT active and "cpu" (thread) scheduling. So Xen should be fairly oblivious and just be treating them as 24 independent things.

It's an 13900K with 24 cores (8P + 16E) and 32 threads (16P + 16E).

Are all cpupools running credit2? Can you experiment using credit across the board? You'll need sched=credit on Xen's command like to change cpupool0. Credit(1) is less concerned about CPU topology compared to credit2, and this may help narrow down if it's something specific to Credit2, or CPUPools.

All the pools are credit2. Before QubesOS upgraded to Xen 4.17 it was possible to run Pool-0 as credit2, and the other pools as credit. That stopped working in 4.17, and I changed all the pools to credit2.

I will try and change them to credit when I get home.

As to the admin events, xl can start the VM in a non-default CPUPool. When we've sorted this crash out, it is probably worth seeing if libvirt has libxl's cpupool controls exposed in a nice way, because that would avoid needing to play with the VM after the fact.

I originally used cpuset in xen.xml to pin the cpus, using adminAPI with python was just a lot easier to manage when creating new VMs. Switching back to using xen.xml with jinja, wouldn't be an issue.

@jgross1

I have a rough idea how to fix it, but this might take some time. In the meantime I'd like you to verify my suspect by posting the output of "xl dmesg" after having setup the cpupools.

Can you provide the full xl dmesg, including activating CONFIG_DEBUG_TRACE and initialising debugtrace_send_to_console = true; near the top of xen/common/debugtrace.c. That should cause relevant messages to come out in sync with other printk()s.

I'm at work right now, so I can't post the xl dmesg, but I will do it when I get home.

renehoj commented 11 months ago

@andyhhp @jgross1

Here is the full dmesg dmesg.txt

I tried switching the pools to credit(1) and shutting down VMs no longer crash the system.

renehoj commented 11 months ago

While I was running credit2 I had this crash

(XEN) common/grant_table.c:1882:d7v0 Expanding d7 grant table from 21 to 22 frames
234 cpupool_unassign_cpu(pool=0,cpu=4)
235 cpupool_unassign_cpu(pool=0,cpu=4)
(XEN) Removing cpu 4 from runqueue 0
236 cpupool_unassign_cpu ret=0
237 cpupool_unassign_cpu(pool=0,cpu=5)
238 cpupool_unassign_cpu(pool=0,cpu=5)
(XEN) Removing cpu 5 from runqueue 0
(XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at common/xmalloc_tlsf.c:246
(XEN) ----[ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82d04022ecbd>] xmem_pool_free+0x25f/0x3da
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: ffff830ae76f9fe4   rbx: ffff830ae76f9f70   rcx: ffff8310828ae004
(XEN) rdx: ffff830ae76f9ff0   rsi: 00000000000000aa   rdi: ffff830ae76f9fe4
(XEN) rbp: ffff830ae76f9fa0   rsp: ffff83107a3b7d30   r8:  ffff83107a06efd0
(XEN) r9:  ffff83107a06eb90   r10: 0000000000000001   r11: 0000000000000001
(XEN) r12: ffff8310828ae000   r13: ffff8310828af868   r14: 0000000000000005
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 0000000000f526e0
(XEN) cr3: 000000004365b000   cr2: 00007a36c2407000
(XEN) fsb: 00007a36c22ba380   gsb: ffff8881b9d00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d04022ecbd> (xmem_pool_free+0x25f/0x3da):
(XEN)  85 c0 0f 84 6e ff ff ff <0f> 0b 48 8b 6b f0 8b 45 08 83 e0 f8 48 83 f8 7f
(XEN) Xen stack trace from rsp=ffff83107a3b7d30:
(XEN)    0000000000000000 ffff830ae76f9f70 0000000000000000 ffff82d0405c1e60
(XEN)    0000000000000005 0000000000000000 ffff82d04022f2c5 ffff830ae76f9f70
(XEN)    0000000000000005 0000000000000000 ffff82d0405c1e60 0000000000000005
(XEN)    ffff82d0402495b3 0000000000000020 0000000000000000 ffff82d0402499bd
(XEN)    ffff831082832190 ffff830ae76f9f70 0000000540246672 ffff82d0405a9288
(XEN)    0000000000000246 ffff83107a06eb40 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff83107a06e8c0 0000000000000000 0000000000000005
(XEN)    ffff831082832230 ffff82d0405a8100 ffff82d040236181 ffff83107a06e8c0
(XEN)    ffff82d04045d5e8 0000004039e11000 0000000000000002 ffff82d0405c1e60
(XEN)    ffff83107a068000 ffff82d040236212 ffff83107a06ef80 ffff831079ff8000
(XEN)    ffff82d040206304 ffff831079ff81b8 ffff82d04045d598 ffff82d040227266
(XEN)    ffff83107a3b91c0 ffff82d0405a81d0 ffff82d040227515 0000000000000002
(XEN)    0000000000007fff ffff82d0405ae680 ffff82d0405a81d0 ffff82d0402e3f3f
(XEN)    ffff82d0402e3ed7 ffff831079ff8000 ffff83107a3b7ef8 ffff831082897000
(XEN)    0000000000000002 0000000000000000 ffff8881200bd300 00007fff275eabf0
(XEN)    0000000000305000 ffff8881200bd300 0000000000000005 0000000000000286
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffffffff81e2346a 0000000000000000 0000000000000000 deadbeefdeadf00d
(XEN)    0000010000000000 ffffffff81e2346a 000000000000e033 0000000000000286
(XEN)    ffffc90047283da8 000000000000e02b af4fdbe2be8abeef 06c385a6470ebeef
(XEN) Xen call trace:
(XEN)    [<ffff82d04022ecbd>] R xmem_pool_free+0x25f/0x3da
(XEN)    [<ffff82d04022f2c5>] S xfree+0x1b0/0x1b9
(XEN)    [<ffff82d0402495b3>] S free_cpu_rm_data+0x4a/0x4f
(XEN)    [<ffff82d0402499bd>] S schedule_cpu_rm+0x405/0x407
(XEN)    [<ffff82d040236181>] S cpupool.c#cpupool_unassign_cpu_finish+0x142/0x193
(XEN)    [<ffff82d040236212>] S cpupool.c#cpupool_unassign_cpu_helper+0x40/0x60
(XEN)    [<ffff82d040206304>] S domain.c#continue_hypercall_tasklet_handler+0x42/0xad
(XEN)    [<ffff82d040227266>] S tasklet.c#do_tasklet_work+0x76/0xa5
(XEN)    [<ffff82d040227515>] S do_tasklet+0x51/0x81
(XEN)    [<ffff82d0402e3f3f>] S domain.c#idle_loop+0x68/0xd3
(XEN)    [<ffff82d0402e3ed7>] S domain.c#idle_loop+0/0xd3
(XEN) 
(XEN) debugtrace_dump() global buffer starting
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at co****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) ----[ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<0000000045f1e11f>] 0000000045f1e11f
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000033   rbx: 0000000000000000   rcx: 00000000465fe000
(XEN) rdx: 0000000000000033   rsi: ffff8310828b7a90   rdi: 0000000000000001
(XEN) rbp: ffff8310828b7950   rsp: ffff8310828b7910   r8:  0000000000000001
(XEN) r9:  ffff8310828b7990   r10: 0000000000000002   r11: 0000000000000000
(XEN) r12: 0000000000000033   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000f526e0
(XEN) cr3: 000000107a3f5000   cr2: 00000000465fe004
(XEN) fsb: 0000000000000000   gsb: ffff8881b9c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <0000000045f1e11f> (0000000045f1e11f):
(XEN)  2d 00 00 48 85 c9 74 5e <8b> 71 04 8b 01 89 f3 81 e3 ff ff ff 0f 39 c3 73
(XEN) Xen stack trace from rsp=ffff8310828b7910:
(XEN)    ffff83107a033900 ffff83107a033900 ffff83107a208100 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8310828b7a90 ffff8310828b7990
(XEN)    ffff8310828b7b50 0000000045f1e6a9 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff8310828b7b70
(XEN)    6573655220455844 326d657473795374 547465736552203a 646c6f4320657079
(XEN)    44206c6c6143202c 31203d2068747065 ffff8310000a0d2e 0000000045f2d7c0
(XEN)    ffff8310828b7a10 0000000045f2b901 ffff8310828b7a10 0000000045f2b3c5
(XEN)    0000000000000001 ffff83107a06ece0 0000000000000000 0000000000000007
(XEN)    ffff8310828b7b50 0000000045f2bd90 ffff82d04023db38 ffff83107a039d30
(XEN)    0000001514dd64e4 ffff83107a06e950 0000000000000046 ffff831082896000
(XEN)    ffff83107a06e980 ffff82d0402e6747 ffff82d0402e778c 00000000000000b3
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000007 000000107a3f5000 0000000000000000
(XEN)    ffff8310828b7c70 0000000045f1e999 0000000000000001 000000000311100a
(XEN)    0000000045f20a84 0000000000000001 0000000000000000 ffff831079fcc730
(XEN) Xen call trace:
(XEN)    [<0000000045f1e11f>] R 0000000045f1e11f
(XEN)    [<ffff83107a033900>] S ffff83107a033900
(XEN)    [<ffff82d04023db38>] S credit2.c#runq_tickle+0x657/0x6a5
(XEN)    [<ffff82d0402e6747>] S sync_local_execstate+0x5/0x6
(XEN)    [<ffff82d0402e778c>] S domain_page.c#mapcache_current_vcpu+0xfa/0x134
(XEN)    [<ffff82d040281d1f>] S efi_reset_system+0x46/0x6a
(XEN)    [<ffff82d040281cf0>] S efi_reset_system+0x17/0x6a
(XEN)    [<ffff82d040308add>] S machine_restart+0x165/0x287
(XEN)    [<ffff82d040308c0a>] S apic_wait_icr_idle+0/0x35
(XEN)    [<ffff82d0402267f8>] S smp_call_function_interrupt+0xc2/0xc6
(XEN)    [<ffff82d040309485>] S call_function_interrupt+0x26/0x37
(XEN)    [<ffff82d0402f30be>] S do_IRQ+0x12b/0x5f9
(XEN)    [<ffff82d0402018dc>] S common_interrupt+0x12c/0x140
(XEN)    [<ffff82d040248ab6>] S core.c#schedule+0x176/0x2a8
(XEN)    [<ffff82d040248ab6>] S core.c#schedule+0x176/0x2a8
(XEN)    [<ffff82d040226202>] S softirq.c#__do_softirq+0x8f/0xb9
(XEN)    [<ffff82d04022628c>] S do_softirq+0xf/0x10
(XEN)    [<ffff82d0402e3f54>] S domain.c#idle_loop+0x7d/0xd3
(XEN)    [<ffff82d0402e3ed7>] S domain.c#idle_loop+0/0xd3
(XEN) 
(XEN) Pagetable walk from 00000000465fe004:
(XEN)  L4[0x000] = 000000107a3f4063 ffffffffffffffff
(XEN)  L3[0x001] = 0000000043654063 ffffffffffffffff
(XEN)  L2[0x032] = 00000000463ff063 ffffffffffffffff
(XEN)  L1[0x1fe] = 0000000000000000 ffffffffffffffff
(XEN) debugtrace_dump() global buffer starting
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 00000000465fe004
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

andyhhp commented 11 months ago

@renehoj FYI

(XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at common/xmalloc_tlsf.c:246

was the thing I was expecting to see with CONFIG_XMEM_POOL_POISON active, and it does look pretty clear now that there's a double-free in the Credit2 logic.

As for:

Before QubesOS upgraded to Xen 4.17 it was possible to run Pool-0 as credit2, and the other pools as credit. That stopped working in 4.17, and I changed all the pools to credit2.

That was not an intentional change.

Juergen did ask if you could use credit2 pools with credit2_runqueue=all on the Xen command line, and see if that works around the crash too.

renehoj commented 11 months ago

@jgross1 I tried to do the credit2_runqueue=all test this morning, but it missed up the command, I will do it when I get home.

@andyhhp Here is the crash from mixing credit1 and credit2 pools, Pool-0 is credit2 all other pools are credit1

156 cpupool move_domain(dom=1)->pool=1
(XEN) Xen BUG at common/sched/credit.c:1051
(XEN) ----[ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82d040237cfd>] credit.c#csched_free_udata+0x12/0x14
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v3)
(XEN) rax: ffff82d040237ceb   rbx: 0000000000000012   rcx: 0000000000000011
(XEN) rdx: ffff83108283ffff   rsi: ffff8310a2951c90   rdi: ffff8310a2951c90
(XEN) rbp: 0000000000000000   rsp: ffff83108283fc98   r8:  ffff8310648925b0
(XEN) r9:  ffff83108283fbe8   r10: 0000000000000001   r11: 0000000000000001
(XEN) r12: ffff8310a2951ba0   r13: ffff83107a06aa10   r14: 0000000000000000
(XEN) r15: ffff82d0405a9288   cr0: 0000000080050033   cr4: 0000000000b526e0
(XEN) cr3: 00000009229f0000   cr2: 00007788dfc05240
(XEN) fsb: 00007788df6e3380   gsb: ffff8881b9d80000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d040237cfd> (credit.c#csched_free_udata+0x12/0x14):
(XEN)  75 06 e8 19 74 ff ff c3 <0f> 0b f3 0f 1e fa 53 48 8b 5f 18 48 85 db 74 2b
(XEN) Xen stack trace from rsp=ffff83108283fc98:
(XEN)    ffff82d0402474e8 0000001100000001 ffff8310a2951ad0 ffff82d0405a9288
(XEN)    ffff83107a06e980 ffff8310a2951ba0 ffff8310a2951ad0 ffff8310648922a0
(XEN)    ffff83107a06aa10 ffff831000aff000 ffff83107a06aa10 0000000000000000
(XEN)    ffff82d04045d5d8 ffff82d0405ae680 ffff82d040235303 ffff83108283fe20
(XEN)    00000000fffffffe ffff82d040236ec3 ffff831000aff000 0000000000000000
(XEN)    00007788dfcb3010 ffff83108283ffff 0000000000000000 ffff82d04045d5d8
(XEN)    ffff82d040234763 0000000000000000 0000000000000001 8c00000000000001
(XEN)    00000000c0000102 ffff82e01255e7c0 8c00000000000002 0000000000000000
(XEN)    ffff82e000000001 8c00000000000001 0000000000000000 0000000000b526e0
(XEN)    0000000000000206 0000000000000004 ffff82e01255e7c0 000000000092af3e
(XEN)    0000000000000004 8000000000000000 0000000000000001 0000000000000004
(XEN)    ffff82d000000004 ffff83107a068000 ffff82e0126aa280 0000000000000000
(XEN)    0000001500000012 0000000100000006 0000000100000000 00007ffc941ea13c
(XEN)    0000000000431042 000000000043d990 000000000043d9b0 00007788df93c434
(XEN)    00007ffc941ea140 0000000000000013 00000000016caa70 5f4a1994b24d3100
(XEN)    00000000016c9870 ffffffffffffff80 0000000000000000 00007ffc941ebf40
(XEN)    000000000043d980 ffff83108283fef8 0000000000000023 ffff831079ff4000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d0402dd00f
(XEN)    ffff831079ff4000 0000000000000000 0000000000000000 ffff82d0402012b7
(XEN)    0000000000000000 ffff888116f4cf00 00007ffc941ea000 0000000000305000
(XEN) Xen call trace:
(XEN)    [<ffff82d040237cfd>] R credit.c#csched_free_udata+0x12/0x14
(XEN)    [<ffff82d0402474e8>] S sched_move_domain+0x595/0x5b1
(XEN)    [<ffff82d040235303>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
(XEN)    [<ffff82d040236ec3>] S cpupool_do_sysctl+0x725/0x760
(XEN)    [<ffff82d040234763>] S do_sysctl+0x827/0x1269
(XEN)    [<ffff82d0402dd00f>] S pv_hypercall+0x3a2/0x4a9
(XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
(XEN) 
(XEN) debugtrace_dump() global buffer starting
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Xen BUG at common/sched/credit.c:1051
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) ----[ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<0000000045f1e11f>] 0000000045f1e11f
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v0)
(XEN) rax: 0000000000000033   rbx: 0000000000000000   rcx: 00000000465fe000
(XEN) rdx: 0000000000000033   rsi: ffff8310828b7950   rdi: 0000000000000001
(XEN) rbp: ffff8310828b7810   rsp: ffff8310828b77d0   r8:  0000000000000001
(XEN) r9:  ffff8310828b7850   r10: 0000000000000002   r11: 0000000000000000
(XEN) r12: 0000000000000033   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000b526e0
(XEN) cr3: 000000107a3f5000   cr2: 00000000465fe004
(XEN) fsb: 0000738087bbb040   gsb: ffff8881b9c00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <0000000045f1e11f> (0000000045f1e11f):
(XEN)  2d 00 00 48 85 c9 74 5e <8b> 71 04 8b 01 89 f3 81 e3 ff ff ff 0f 39 c3 73
(XEN) Xen stack trace from rsp=ffff8310828b77d0:
(XEN)    ffff8310828b7a80 00000000f2013024 0000000000008000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8310828b7950 ffff8310828b7850
(XEN)    ffff8310828b7a10 0000000045f1e6a9 0000000000000000 0000000000000000
(XEN)    ffff8310828b7fff 0000000000000000 ffff82d04020170b ffff8310828b7a30
(XEN)    6573655220455844 326d657473795374 547465736552203a 646c6f4320657079
(XEN)    44206c6c6143202c 31203d2068747065 00000000000a0d2e 0000000045f2d7c0
(XEN)    ffff8310828b78d0 0000000045f2b901 ffff8310828b78d0 0000000045f2b3c5
(XEN)    ffff82d0402e76b6 000000000000009d 0000000000000000 0000000000000007
(XEN)    ffff8310828b7a10 0000000045f2bd90 000000fc828adae0 0000000000000046
(XEN)    00000000000000fc ffff82d040368800 ffff82d0402e76b6 00000000000000c3
(XEN)    ffff82d0402e76b6 00000000000000c5 0000000000000046 0000000000000086
(XEN)    000100ffff00ffff ff02010100010101 00005fae0fefb050 00005fae0feed9a0
(XEN)    000100ffff00ffff ff02010100010101 00005fae0fefb050 00005fae0feed9a0
(XEN)    0000001a00000003 00005fae0feedc70 626211e2e59dd232 626211e2e59dd232
(XEN)    62623dcc6d8cc762 62626262626261e3 9999999999999999 9999999999999999
(XEN)    2020202020202020 2020202020202020 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000007 000000107a3f5000 0000000000000000
(XEN)    ffff8310828b7b30 0000000045f1e999 0000000000000001 000000290311100a
(XEN)    0000000045f20a84 0000000000000001 0000000000000000 00000008b8ad60db
(XEN) Xen call trace:
(XEN)    [<0000000045f1e11f>] R 0000000045f1e11f
(XEN)    [<ffff8310828b7a80>] S ffff8310828b7a80
(XEN)    [<ffff82d04020170b>] S x86_64/entry.S#restore_all_xen+0xb/0xb0
(XEN)    [<ffff82d0402e76b6>] S domain_page.c#mapcache_current_vcpu+0x24/0x134
(XEN)    [<ffff82d0402e76b6>] S domain_page.c#mapcache_current_vcpu+0x24/0x134
(XEN)    [<ffff82d0402e76b6>] S domain_page.c#mapcache_current_vcpu+0x24/0x134
(XEN)    [<ffff82d040281d1f>] S efi_reset_system+0x46/0x6a
(XEN)    [<ffff82d040281cf0>] S efi_reset_system+0x17/0x6a
(XEN)    [<ffff82d040308add>] S machine_restart+0x165/0x287
(XEN)    [<ffff82d040308c0a>] S apic_wait_icr_idle+0/0x35
(XEN)    [<ffff82d0402267f8>] S smp_call_function_interrupt+0xc2/0xc6
(XEN)    [<ffff82d040309485>] S call_function_interrupt+0x26/0x37
(XEN)    [<ffff82d0402f30be>] S do_IRQ+0x12b/0x5f9
(XEN)    [<ffff82d0402f30be>] S do_IRQ+0x12b/0x5f9
(XEN)    [<ffff82d0402018dc>] S common_interrupt+0x12c/0x140
(XEN)    [<ffff82d0402f53ab>] S mm.c#pv_l1tf_check_pte+0x11/0x91
(XEN)    [<ffff82d0402f839e>] S mm.c#_get_page_type+0xb68/0x1681
(XEN)    [<ffff82d0402f964d>] S get_page_type_preemptible+0x21/0x24
(XEN)    [<ffff82d0402fed5f>] S do_mmuext_op+0x42f/0x1677
(XEN)    [<ffff82d0402dc7c5>] S pv_do_multicall_call+0xe0/0x524
(XEN)    [<ffff82d0402e9f59>] S arch_do_multicall_call+0x24/0x3e
(XEN)    [<ffff82d04021f613>] S do_multicall+0xcd/0x45a
(XEN)    [<ffff82d0402dcd94>] S pv_hypercall+0x127/0x4a9
(XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
(XEN) 
(XEN) Pagetable walk from 00000000465fe004:
(XEN)  L4[0x000] = 000000107a3f4063 ffffffffffffffff
(XEN)  L3[0x001] = 0000000043654063 ffffffffffffffff
(XEN)  L2[0x032] = 00000000463ff063 ffffffffffffffff
(XEN)  L1[0x1fe] = 0000000000000000 ffffffffffffffff
(XEN) debugtrace_dump() global buffer starting
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 00000000465fe004
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

jgross1 commented 11 months ago

I have tested my hypothesis by using "credit2_runqueue=core" on my test system, with 2 cpupools of 2 cores each. Trying to move a domain form one cpupool into the other shows exactly the crash in credit2.

Working on a patch now.

jgross1 commented 11 months ago

This Xen patch did fix it for me. patch.txt

renehoj commented 11 months ago

@jgross1 @andyhhp

The patch also works on my system, thanks for all the help with solving this issue.

andyhhp commented 11 months ago

@renehoj Thanks for testing it. We'll get this fixed in Xen (but it will have to wait a bit because we're in code freeze for the 4.18 right now), but in the meantime QubesOS probably wants to take it as a local patch.

However, while we're here and have some firmware developers on hand (hello @miczyg1),

(XEN) RIP:    e008:[<0000000045f1e11f>] 0000000045f1e11f

This is a problem we've come across before and never gotten to the bottom of. That's something in the EFI reset system call, falling over an unmapped pointer.

It's almost certainly a consequence of Xen not doing something that Linux/Windows does as standard. Do you have any suggestions for how to investigate further ?

miczyg1 commented 11 months ago

@renehoj Thanks for testing it. We'll get this fixed in Xen (but it will have to wait a bit because we're in code freeze for the 4.18 right now), but in the meantime QubesOS probably wants to take it as a local patch.

However, while we're here and have some firmware developers on hand (hello @miczyg1),
(XEN) RIP:    e008:[<0000000045f1e11f>] 0000000045f1e11f
This is a problem we've come across before and never gotten to the bottom of. That's something in the EFI reset system call, falling over an unmapped pointer.

It's almost certainly a consequence of Xen not doing something that Linux/Windows does as standard. Do you have any suggestions for how to investigate further ?

@andyhhp falling over an unmapped pointer. you say... EFI reset system call is a Runtime Service call which can do more than just a simple I/O write to 0xCF9. In EDK2 there is a ResetSystemLib, which provides the implementation for the standard reset routine. I wouldn't be surprised if some boards implement some custom things in there (hopefully not, but I saw my Dell OptiPlex being reset using an SMI handler according to FADT on stock firmware). That being said, if the reset routine is somewhat complicated and BIOS developer forgets to perform EfiConvertPointer on the pointers used by runtime code during virtual address change event, then maybe falling over an unmapped pointer. is the result.

Didn't dig what Linux/Windows uses, but what if you would just use the reset definition from FADT?

Is the problem common for a specific BIOS vendor or a handful of platforms?

pietrushnic commented 11 months ago

@miczyg1 IIUC this log from @renehoj, who use Dasharo, so BIOS vendor is 3mdeb.

andyhhp commented 11 months ago

Sorry that I wasn't clear. We (Xen) have seen this on many systems, but in this instance it's on a Dasharo system. It is almost an issue caused by Xen not doing something that other OSes do do by default.

Xen's exception handlers are still in place while UEFI-RS are active, which is why we get the second backtrace in https://github.com/Dasharo/dasharo-issues/issues/488#issuecomment-1769977777

It's mov 0x4(%rcx),%esi when %rcx has the value 00000000465fe000, hence the resulting %cr2 value of 00000000465fe004.

I suppose my first question is whether there are any Dasharo debug symbols which can be used to figure out which function/line the instruction at 0000000045f1e11f is, and what data it's trying to access at 00000000465fe000 so we can try and figure out why it has ended up unmapped.

[EDIT] From the full log in https://github.com/Dasharo/dasharo-issues/issues/488#issuecomment-1770684776 the data resides in:

(XEN) EFI RAM map:
...
(XEN)  [0000000046451000, 00000000503fffff] (reserved)

and

(XEN) EFI memory map:
...
(XEN)  00000464a5000-00000503fffff type=0 attr=000000000000000f

which is not identified as runtime services code/data.

miczyg1 commented 11 months ago

@andyhhp Well there are ELF object files after build finishes which can be treated with objdump to interleave assembly and source to find it.

I have tried to look for any mov something(%rcx),%esi occurrences around modules with RS Reset but to no avail.

miczyg1 commented 11 months ago

Sorry that I wasn't clear. We (Xen) have seen this on many systems, but in this instance it's on a Dasharo system. It is almost an issue caused by Xen not doing something that other OSes do do by default.

Xen's exception handlers are still in place while UEFI-RS are active, which is why we get the second backtrace in #488 (comment)

It's mov 0x4(%rcx),%esi when %rcx has the value 00000000465fe000, hence the resulting %cr2 value of 00000000465fe004.

I suppose my first question is whether there are any Dasharo debug symbols which can be used to figure out which function/line the instruction at 0000000045f1e11f is, and what data it's trying to access at 00000000465fe000 so we can try and figure out why it has ended up unmapped.

[EDIT] From the full log in #488 (comment) the data resides in:
(XEN) EFI RAM map:
...
(XEN)  [0000000046451000, 00000000503fffff] (reserved)
and
(XEN) EFI memory map:
...
(XEN)  00000464a5000-00000503fffff type=0 attr=000000000000000f
which is not identified as runtime services code/data.

Oh wow. I just checked what is under 0x465fe000 in the logs provided here: https://github.com/Dasharo/dasharo-issues/issues/488#issuecomment-1718127831:

[DEBUG] CONSOLE 3. 0x465fe000 0x00100000

So basically UEFI serial port debug library tries to write some debug information to the coreboot's console buffer... We use this to dump boot logs from the whole booting process (not only coreboot but also edk2). What it checks for is this struct:

struct cbmem_console {
  UINT32    size;
  UINT32    cursor;
  UINT8     body[0];
} __attribute__ ((packed));

Offset 4 would be the current cursor position where to store next debug information in the whole 1MB (minus 8 bytes for size and cursor) buffer which is allocated for this debug information.

So I see two ways out of this:

When switching to runtime during EBS, mark the console buffer as RuntimeData OR
Supress writing to the console buffer if at runtime.

Rather opting for 2nd solution, because we rather don't need debug information from runtime. What we care about is everything until EBS only.

tl;dr

@andyhhp not a Xen issue this time :) definitely our fault

miczyg1 commented 11 months ago

https://github.com/Dasharo/edk2/pull/99 this should solve it. I only need to test it (and install Qubes first :) )

andyhhp commented 11 months ago

@renehoj Are you happy to provide a name/email for a Reported-by and/or Tested-by: tags in the Xen fix, or would you prefer that we just link to this issue?

renehoj commented 11 months ago

@andyhhp I would prefer you just link to this issue.

@miczyg1 I can test it for you, if there is a rom I can download

andyhhp commented 11 months ago

@andyhhp I would prefer you just link to this issue.

Sure.

miczyg1 commented 7 months ago

@renehoj the v0.9.1 ROM contains the fix. Could you please confirm the issue is still present or not?

renehoj commented 7 months ago

@miczyg1 Yes, I haven't had any crashes with v0.9.1, as fare as I can tell the issue is solved.

Dasharo / dasharo-issues

Shutdown sometimes reboots the system on Z790 #488