churchers / vm-bhyve

Shell based, minimal dependency bhyve manager
BSD 2-Clause "Simplified" License
845 stars 181 forks source link

Debian Kernel Panic #360

Open teisho opened 4 years ago

teisho commented 4 years ago

I have a problem with panics when installing a VM. My System: 12.1-STABLE 2x AMD Opteron Processor 6380 256GB RAM Any Ideas?


root@nazgul:~ # vm install -f testvm debian-10.3.0-amd64-netinst.iso
Starting testvm
  * found guest in /vms/testvm
  * booting...
[    0.004628] divide error: 0000 [#1] SMP NOPTI
[    0.007933] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-8-amd64 #1 Debian 4.19.98-1
[    0.008013] Hardware name:  BHYVE, BIOS 1.00 03/14/2014
[    0.008013] RIP: 0010:init_amd+0x388/0x730
[    0.008013] Code: 4b 23 08 80 3b 16 0f 87 3a fe ff ff 0f b7 8b d0 00 00 00 31 d2 0f b7 b3 de 00 00 00
89 93 de 00 00 00 e9 0e fe ff ff f0 80 4b 60 40 e9 f3 fc
[    0.008013] RSP: 0000:ffffffff89403e00 EFLAGS: 00010246
[    0.008013] RAX: 0000000000000000 RBX: ffffffff8952c320 RCX: 0000000000000000
[    0.008013] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff89403db0
[    0.008013] RBP: ffffffff89403e0c R08: ffffffff89403db8 R09: ffffffff89403dbc
[    0.008013] R10: ffff96a29edf0140 R11: ffff96a29edf0140 R12: 0000000000000102
[    0.008013] R13: 000000000000f010 R14: 0000000000000000 R15: 0000000000000000
[    0.008013] FS:  0000000000000000(0000) GS:ffff96a29f200000(0000) knlGS:0000000000000000
[    0.008013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.008013] CR2: ffff96a28c001000 CR3: 000000000b80a000 CR4: 00000000000406b0
[    0.008013] Call Trace:
[    0.008013]  ? get_cpu_cap+0x227/0x410
[    0.008013]  identify_cpu+0x2af/0x580
[    0.008013]  identify_boot_cpu+0xc/0x74
[    0.008013]  check_bugs+0x28/0x933
[    0.008013]  ? __slab_alloc+0x29/0x30
[    0.008013]  ? kmem_cache_alloc+0x1b8/0x1c0
[    0.008013]  start_kernel+0x4e8/0x52c
[    0.008013]  secondary_startup_64+0xa4/0xb0
[    0.008013] Modules linked in:
[    0.008017] ---[ end trace a032866fc6b427bc ]---
[    0.010371] RIP: 0010:init_amd+0x388/0x730
[    0.012016] Code: 4b 23 08 80 3b 16 0f 87 3a fe ff ff 0f b7 8b d0 00 00 00 31 d2 0f b7 b3 de 00 00 00
89 93 de 00 00 00 e9 0e fe ff ff f0 80 4b 60 40 e9 f3 fc
[    0.016016] RSP: 0000:ffffffff89403e00 EFLAGS: 00010246
[    0.020016] RAX: 0000000000000000 RBX: ffffffff8952c320 RCX: 0000000000000000
[    0.024016] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff89403db0
[    0.028015] RBP: ffffffff89403e0c R08: ffffffff89403db8 R09: ffffffff89403dbc
[    0.032015] R10: ffff96a29edf0140 R11: ffff96a29edf0140 R12: 0000000000000102
[    0.036015] R13: 000000000000f010 R14: 0000000000000000 R15: 0000000000000000
[    0.040017] FS:  0000000000000000(0000) GS:ffff96a29f200000(0000) knlGS:0000000000000000
[    0.044016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.047002] CR2: ffff96a28c001000 CR3: 000000000b80a000 CR4: 00000000000406b0
[    0.048017] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.051623] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
ghost commented 4 years ago

Hi, Can you share testvm.conf?

alexmaloteaux commented 4 years ago

hi i have exactly the same issue with 2x amd 6378, any workaround ?

0.028868] divide error: 0000 [#1] SMP NOPTI [ 0.030276] Modules linked in: [ 0.031243] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-76-generic #86-Ubuntu [ 0.032000] Hardware name: BHYVE, BIOS 1.00 03/14/2014 [ 0.032000] RIP: 0010:init_amd+0x227/0x700 [ 0.032000] RSP: 0000:ffffffff9c203dc0 EFLAGS: 00010246 [ 0.032000] RAX: 0000000000000000 RBX: ffffffff9c45c1e0 RCX: 0000000000000000 [ 0.032000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9c203d70 [ 0.032000] RBP: ffffffff9c203e28 R08: ffffffff9c203d78 R09: ffffffff9c203d7c [ 0.032000] R10: ffff92f39ea18400 R11: 0000000000026f20 R12: 0000000000000102 [ 0.032000] R13: 000000000000f010 R14: ffffffff9c203dd4 R15: 0000000000000000 [ 0.032000] FS: 0000000000000000(0000) GS:ffff92f39fc00000(0000) knlGS:0000000000000000 [ 0.032000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.032000] CR2: ffff92f39658e000 CR3: 0000000015e0a000 CR4: 00000000000406b0 [ 0.032000] Call Trace: [ 0.032000] identify_cpu+0x2c2/0x570 [ 0.032000] identify_boot_cpu+0x10/0x7a [ 0.032000] check_bugs+0x2a/0x992 [ 0.032000] ? kmem_cache_alloc+0x1a2/0x1b0 [ 0.032000] ? delayacct_init+0x52/0x70 [ 0.032000] start_kernel+0x4b8/0x4fd [ 0.032000] x86_64_start_reservations+0x24/0x26 [ 0.032000] x86_64_start_kernel+0x74/0x77 [ 0.032000] secondary_startup_64+0xa5/0xb0 [ 0.032000] Code: 01 76 31 f0 80 4b 23 08 80 3b 16 77 27 0f b7 8b d0 00 00 00 31 d2 0f b7 b3 de 00 00 00 89 c8 f7 35 df 78 3e 01 31 d2 89 c1 89 f0 f1 66 89 93 de 00 00 00 65 44 8b 25 b8 89 1c 65 44 89 e7 44 [ 0.032000] RIP: init_amd+0x227/0x700 RSP: ffffffff9c203dc0 [ 0.032011] ---[ end trace 6579d43931fea3d2 ]--- [ 0.036003] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.038162] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

alexmaloteaux commented 4 years ago

after a bit of reverse , the failing function is located here : https://elixir.bootlin.com/linux/v4.19.10/source/arch/x86/kernel/cpu/amd.c#L305

`static void legacy_fixup_core_id(struct cpuinfo_x86 *c) { u32 cus_per_node;

if (c->x86 >= 0x17)
    return;

cus_per_node = c->x86_max_cores / nodes_per_socket;
c->cpu_core_id %= cus_per_node;

}`

So it seems that nodes_per_socket is bigger then x86_max_cores and the modulo then fails

alexmaloteaux commented 4 years ago

ok fbsd 12+ it is possible to bypass this panic by setting : cpu_sockets=1

zacharysandberg commented 4 years ago

@alexmaloteaux Could you please PM me your PayPal email? I've been trying to solve this for more then 8 months after upgrading to FreeBSD 12.1 from 11.3 where none of my Windows or Linux VMs using grub2-bhyve or bhyve's uefi would work. (bhyveload still worked however) I've hit dead end after dead end trying to find a workaround until this weekend with your solution. Thank you sir!

alexmaloteaux commented 4 years ago

:) thanks for the proposition, appreciate it ,but no need at all ,glad to help. Send it to any good charity cause if you really want :)

alexmaloteaux commented 4 years ago

btw if you have 2 cpu , then you have to put cpu_sockets=2 and at minimum 2 core and 4 cpu. Otherwise for any other reason only 1 core will be used at all time on the guest. At least on 2x amd 6378

zacharysandberg commented 4 years ago

My box has 2x AMD 6380s, so it should be similar in results to yours. I assume that this lack of CPU provisioning flexibility is a bug and not a feature, right? Thanks again.

alexmaloteaux commented 4 years ago

try with 2 cpu / 2 core / 1 socket . launch a stress-ng with 2 cpu and check with top/htop if both are being used on the guest or not. If not then you have to use 1 cpu - 1 core - 1 socket. or 4 cpu - 2 core - 2 socket , ... or recompile kernel with that panic function commented has it is not needed at all and restest.

I have a coreboot based board so that scheduling issue may be specific to my bios.

schmitmd commented 3 years ago

Could this be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247636 ?

alexmaloteaux commented 3 years ago

I dont think so , this bug is related to pre 17h family like the f10h/f15h.