Mic92 / vmsh

Shell into a virtualized linux, with your own tools
https://vmsh.thalheim.io
MIT License
136 stars 7 forks source link

Kernel oops (unhandled page fault) on Ubuntu kernel #338

Open Kazurin-775 opened 2 years ago

Kazurin-775 commented 2 years ago

On a VM booted up with Ubuntu 20.04 LTS cloud image, when the vmsh kernel library is unloaded from the guest address space, an unhandled page fault will happen in the guest kernel:

[   39.862903] BUG: unable to handle page fault for address: ffffffff800012e2
[   39.863866] #PF: supervisor instruction fetch in kernel mode
[   39.864637] #PF: error_code(0x0010) - not-present page
[   39.865365] PGD 7eb80e067 P4D 7eb80e067 PUD 7eb80f063 PMD 0
[   39.866143] Oops: 0010 [#1] SMP PTI
...
[   39.881668] Call Trace:
[   39.882094]  ? process_one_work+0x1eb/0x3b0
[   39.882723]  ? worker_thread+0x4d/0x400
[   39.883314]  ? kthread+0x104/0x140
[   39.883845]  ? process_one_work+0x3b0/0x3b0
[   39.884475]  ? kthread_park+0x90/0x90
[   39.885046]  ? ret_from_fork+0x35/0x40

The fault address 0xffffffff800012e2 points to libstage1.so's code. The assembly reads as following:

    12d3:       48 8d 3d 73 a4 3e 00    lea    0x3ea473(%rip),%rdi        # 3eb74d <_fini+0x3e9979>
    12da:       31 c0                   xor    %eax,%eax
    12dc:       ff 15 a6 bc 3e 00       callq  *0x3ebca6(%rip)        # 3ecf88 <_printk>
--> 12e2:       48 83 c4 58             add    $0x58,%rsp
    12e6:       5b                      pop    %rbx
    12e7:       41 5c                   pop    %r12
    12e9:       41 5d                   pop    %r13
    12eb:       41 5e                   pop    %r14
    12ed:       41 5f                   pop    %r15
    12ef:       5d                      pop    %rbp
    12f0:       c3                      retq

which corresponds to the function tail after the following statement: https://github.com/Mic92/vmsh/blob/cfbb612d5f5615b194fbacd9b4a32e9816eac3b4/src/stage1/src/lib.rs#L587-L588

It seems that the vmsh kernel library is unmapped before the stage1 kernel worker runs to completion, which should be a bug.


Commands to reproduce the error:

qemu-system-x86_64 --nographic -m 32G --machine 'q35,accel=kvm' --hda './focal-server-cloudimg-amd64.img'

cargo run attach --stage2-path /tmp/vmsh -f ../linux/nixos.ext4 `pidof qemu-system-x86_64`

# Use Ctrl-C to terminate vmsh when it says "stage1 driver started"

Logs: kernel-oops.log, vmsh.log

Kazurin-775 commented 2 years ago

Sorry that I forgot an important point: in order to trigger the oops, one has to actually make the kernel print the message stage1: finished, e.g. by using:

echo 7 | sudo tee /proc/sys/kernel/printk

But this bug still cannot be reproduced on the kernel shipped with VMSH anyway.