dolohow / uksm

Ultra Kernel Samepage Merging
241 stars 35 forks source link

UKSM crashing host and guest with KVM on 5.15 #76

Open sempervictus opened 2 years ago

sempervictus commented 2 years ago

When building a service stack in Vagrant today, we observed a number of crashes over several boots of host and guest related to memory access errors. Both guest and host run similar kernels, both with UKSM enabled. The first problem was an NX fault from inside the VM, after a reboot of the host (grsec locks down the offending UID so further debugging was not happening):

[   96.881125] WARNING: CPU: 3 PID: 227 at kvm_mmu_notifier_change_pte+0x290/0x2e0 [kvm]
...
[   96.881218] RIP: 0010:[<ffffffff851e3c50>] kvm_mmu_notifier_change_pte+0x290/0x2e0 [kvm]
[   96.881226] Code: 44 0f b6 4c 24 30 85 c0 75 2b 48 8b 44 24 18 48 83 80 80 76 ff ff 01 45 84 c9 74 96 48 8b 44 24 18 c6 80 50 55 ff ff 00 eb 88 <0f> 0b e9 b0 fd ff ff 0f 0b eb 87 31 c9 31 d2 be 00 03 00 00 44 88
[   96.881241] RSP: 0000:ffffc9000069fc10 EFLAGS: 00010246
[   96.881242] RAX: 0000000000000000 RBX: 000078cad98ec000 RCX: 8000000252eec005
[   96.881243] RDX: 000078cad98ec000 RSI: ffff88814059fb40 RDI: ffffc90027877ab0
[   96.881244] RBP: ffffc9000069fcb8 R08: 0000000000000000 R09: ffffffffffffffff
[   96.881245] R10: 0000000000000080 R11: 0000000000000002 R12: 8000000252eec005
[   96.881245] R13: 000078cad98ec000 R14: 8000000000000000 R15: 0000000000000000
[   96.881248] RSI: mm_struct+0x0/0x428
[   96.881253] RDI: kvm_dev_ioctl+0xab/0x770 [kvm]
[   96.881256] RBP: copy_process+0x524/0x2ba0
[   96.881258] FS:  0000000000000000(0000) GS:ffff88904d400000(0000) knlGS:0000000000000000
[   96.881259] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   96.881260] CR2: 000055677c380000 CR3: 000000000221c005 CR4: 00000000003626f0 shadow CR4: 00000000003626f0
[   96.881261] Stack:
[   96.881262]  ffff88904d41d080 ffff88904d41e840 ffff88904d41e840 ffffc90027877ab0
[   96.881263]  ffffffff818892cd 8000000252eec005 0000000000000000 ffff88904ec00000
[   96.881265]  0000000300000003 ffffffff810693a0 8000000000000000 0000000100e75000
[   96.881266] Call Trace:
[   96.881267]  <TASK>
[   96.881268]  [<ffffffff818892cd>] ? cpumask_next+0x1d/0x30
[   96.881272]  [<ffffffff810693a0>] ? switch_mm+0x20/0x20
[   96.881274]  [<ffffffff852038b7>] ? kvm_arch_mmu_notifier_invalidate_range+0x17/0x40 [kvm]
[   96.881284]  [<ffffffff8129c91c>] __mmu_notifier_change_pte+0x5c/0xa0
[   96.881288]  [<ffffffff812a43e9>] cmp_and_merge_page+0x1d19/0x27d0
[   96.881290]  [<ffffffff812a59d8>] scan_vma_one_page+0xb38/0x1530
[   96.881291]  [<ffffffff8129de59>] ? try_down_read_slot_mmap_sem+0x79/0x1d0
[   96.881292]  [<ffffffff812a6503>] uksm_do_scan+0x133/0x27a0
[   96.881293]  [<ffffffff8111f3e0>] ? init_timer_key+0x40/0x40
[   96.881296]  [<ffffffff812a8b70>] ? uksm_do_scan+0x27a0/0x27a0
[   96.881297]  [<ffffffff812a8cfb>] uksm_scan_thread+0x18b/0x1d0
[   96.881298]  [<ffffffff810b1123>] kthread+0x1a3/0x1d0
[   96.881300]  [<ffffffff810b0f80>] ? kthread_create_worker_on_cpu+0x1d0/0x1d0
[   96.881302]  [<ffffffff810040ec>] ret_from_fork+0x2c/0x40
[   96.881305]  </TASK>
[   96.881306] ---[ end trace 8b2de732648f84fb ]---

Disabling UKSM in either the host or guest seems to avoid whatever is causing the crashes. Been using UKSM (atop grsec and on our linux-hardened branch) for ages, never seen anything quite like this before.