KSPP / linux

Linux kernel source tree (Kernel Self Protection Project)
https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project
Other
82 stars 5 forks source link

Failure with zero call-used registers #192

Closed bwendling closed 1 year ago

bwendling commented 2 years ago

During the Clang implementation of -fzero-call-used-regs, @nathanchance came across an issue where a Clang-built kernel failed to boot. It was seen as a issue, but is probably more pernicious than we first thought.

Related to #84.

Steps to replicate (by @nathanchance).

We uses Fedora for ease of installation via the serial console. @nathanchance wrote a Python script to drive QEMU specifically for testing. The latest version can be found here:

https://github.com/nathanchance/env/raw/235104e8740fcc8833caa284ebc549416dd6eb7b/python/cbl_vmm.py

With that, the setup command can be run to initially configure the machine.

cbl_vmm.py setup -i "https://download.fedoraproject.org/pub/fedora/linux/releases/36/Server/x86_64/iso/Fedora-Server-netinst-x86_64-36-1.5.iso" -n fedora

This downloads the .iso file in to a vm folder in the same folder as cbl_vmm.py. Once it's finished downloading, QEMU should fire up, printing the full command it uses.

Next, you should see grub. Hit 'e' to add console=ttyS0 to the linuxefi line after quiet then hit Ctrl-X to boot.

After a bunch of text flies across the screen, you should be prompted to either use VNC or text mode for configuration. I usually use text mode because it is a little simpler. The console might look a bit garbled, hitting r then Enter a few times should clear things up enough to see. In text mode, I customized the time zone and user creation sections but leave the installation source, software selection, and network configuration. Don't touch the root account. Instead create a user account that's the same as yours on the host and give it a password of some sort (i.e. don't just leave it blank).

The installation destination has to be run through but leave everything on the default. After the installation process finishes, the machine should reboot.

The script adds a forwarding rule for the virtual machine's SSH port (22) to the host (8022) so you should be able to log into the machine like:

ssh -p 8022 localhost

Once you are in, you'll need to do three things to make booting self compiled kernels through QEMU possible (as opposed to generating a kernel package and installing it within the machine).

  1. Copy the contents of /proc/cmdline somewhere you can use later, aside from the BOOT_IMAGE variable.

  2. Generate an initramfs and copy it out of the machine.

    Within the VM:

    sudo bash -c "rm -f /tmp/initramfs.img && dracut --no-kernel /tmp/initramfs.img && chown $USER:$USER /tmp/initramfs.img"

    Outside of the VM:

    scp -P 8022 localhost:/tmp/initramfs.img /tmp
  3. Get a list of the currently running kernel modules (it is easier to generate outside of the VM):

ssh -p 8022 localhost lsmod >/tmp/lsmod.txt

Once those are done, exit QEMU and go into your kernel source.

First, we'll use Fedora's kernel configuration as a base:

curl -LSso .config https://src.fedoraproject.org/rpms/kernel/raw/rawhide/f/kernel-x86_64-fedora.config

Next, we're going to build all the modules that we need to boot the VM into the kernel image with the 'localyesconfig' target (I have not found a clean way to generate a Fedora initramfs with modules) then run 'menuconfig' to turn on CONFIG_ZERO_CALL_USED_REGS then finally build the bzImage. The 'localyesconfig' step might prompt you for some values, I usually just hit <Enter> to choose the defaults.

make -skj"$(nproc)" LLVM=1 LSMOD=/tmp/lsmod.txt olddefconfig localyesconfig menuconfig bzImage

Once the image is done, you should be able to fire up the VM with the pieces we generated earlier (I usually do this in a different terminal/tmux tab):

cbl_vmm.py run -C "<cmdline_string_from_earlier>" -i /tmp/initramfs.img -k <path_to_kernel_src> -n fedora

That hangs for me with CONFIG_ZERO_CALL_USED_REGS=y. If it is disabled, the machine fully boots.

The script supports -g, which will pass -s -S to QEMU for easier debugging with GDB, you'll just need to make sure to enable CONFIG_GDB_SCRIPTS during the menuconfig stage and add the scripts_gdb target to your make command (or just use all in lieu of bzImage).

bwendling commented 2 years ago

If we take arch/x86/kernel/probe_roms.c, the call to find_oprom() takes one argument. In the cases where it's called, the parameter is the same as one from the function that's calling it. As such, Clang seems to think it can omit saving the %rdi register, because they're supposedly going to remain the same when the function exits. But we're zeroing it out regardless.

My current hypothesis:

The function that's clearing the %edi register is static. I think that Clang looks at it and says to itself, "Hmm..Ya know, I can create pretty much any calling convention I want to with this thing, because no one will even know. MUWAHAHAHAHA!" And so even though the %edi register is caller-saved, it doesn't get saved because Clang determines either that it won't matter afterwards, or the value will be the original %edi value upon exit. If it were properly stored, then it would have been restored before return and the zeroing stuff wouldn't have been emitted.

bwendling commented 2 years ago

This may be an issue with specifying the proper constraints for ASM blocks that are used. It would also explain why removing the attribute from a seemingly insignificant function can cause the failure. Here's a patch that appears to work:

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 89df6c6617f5..0937cb110e99 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -415,7 +415,7 @@ int paravirt_disable_iospace(void);
 #define PVOP_CALL_CLOBBERS     PVOP_VCALL_CLOBBERS, "=a" (__eax)

 /* void functions are still allowed [re]ax for scratch */
-#define PVOP_VCALLEE_CLOBBERS  "=a" (__eax)
+#define PVOP_VCALLEE_CLOBBERS  "=a" (__eax), "=D" (__edi)
 #define PVOP_CALLEE_CLOBBERS   PVOP_VCALLEE_CLOBBERS

 #define EXTRA_CLOBBERS  , "r8", "r9", "r10", "r11"
nathanchance commented 2 years ago

That diff works for me.

Tested-by: Nathan Chancellor <nathan@kernel.org>

bwendling commented 2 years ago

@nathanchance Thank you. sigh I didn't get the correct root cause...

bwendling commented 2 years ago

@nathanchance I cleaned up the patch. Could you run it through another test please? I'll submit it once you're okay with it.

commit 28e6f7f0bc0b8622398dabd37d4c44a231e47522 (HEAD -> linux/upstream)
Author: Bill Wendling <morbo@google.com>
Date:   Fri Sep 2 12:11:59 2022 -0700

    x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled

    The ZERO_CALL_USED_REGS feature may zero out caller-saved registers
    before returning.

    In spurious_kernel_fault(), the "pte_offset_kernel()" call results in
    this assembly code:

    .Ltmp151:
            #APP
            # ALT: oldnstr
    .Ltmp152:
    .Ltmp153:
    .Ltmp154:
            .section        .discard.retpoline_safe,"",@progbits
            .quad   .Ltmp154
            .text

            callq   *pv_ops+536(%rip)

    .Ltmp155:
            .section        .parainstructions,"a",@progbits
            .p2align        3, 0x0
            .quad   .Ltmp153
            .byte   67
            .byte   .Ltmp155-.Ltmp153
            .short  1
            .text
    .Ltmp156:
            # ALT: padding
            .zero   (-(((.Ltmp157-.Ltmp158)-(.Ltmp156-.Ltmp152))>0))*((.Ltmp157-.Ltmp158)-(.Ltmp156-.Ltmp152)),144
    .Ltmp159:
            .section        .altinstructions,"a",@progbits
    .Ltmp160:
            .long   .Ltmp152-.Ltmp160
    .Ltmp161:
            .long   .Ltmp158-.Ltmp161
            .short  33040
            .byte   .Ltmp159-.Ltmp152
            .byte   .Ltmp157-.Ltmp158
            .text

            .section        .altinstr_replacement,"ax",@progbits
            # ALT: replacement 1
    .Ltmp158:
            movq    %rdi, %rax
    .Ltmp157:
            .text
            #NO_APP
    .Ltmp162:
            testb   $-128, %dil

    The "testb" here is using %dil, but the %rdi register was cleared before
    returning from "callq *pv_ops+536(%rip)". Adding the proper constraints
    results in the use of a different register:

            movq    %r11, %rdi

            # Similar to above.

            testb   $-128, %r11b

    Link: https://github.com/KSPP/linux/issues/192
    Signed-off-by: Bill Wendling <morbo@google.com>

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index f04157456a49..f79316538ddc 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -414,8 +414,17 @@ int paravirt_disable_iospace(void);
                                "=c" (__ecx)
 #define PVOP_CALL_CLOBBERS     PVOP_VCALL_CLOBBERS, "=a" (__eax)

-/* void functions are still allowed [re]ax for scratch */
+/*
+ * void functions are still allowed [re]ax for scratch.
+ *
+ * The ZERO_CALL_USED REGS feature may end up zeroing out callee-saved
+ * registers. Make sure we model this with the appropriate clobbers.
+ */
+#ifndef CONFIG_ZERO_CALL_USED_REGS
 #define PVOP_VCALLEE_CLOBBERS  "=a" (__eax)
+#else
+#define PVOP_VCALLEE_CLOBBERS  "=a" (__eax), PVOP_VCALL_CLOBBERS
+#endif
 #define PVOP_CALLEE_CLOBBERS   PVOP_VCALLEE_CLOBBERS

 #define EXTRA_CLOBBERS  , "r8", "r9", "r10", "r11"
nathanchance commented 2 years ago

Could you run it through another test please?

Sure thing, it still works :)

I would swap the branches of the ifndef and change it to ifdef like below, I think that is a little easier to read.

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 89df6c6617f5..bc2e1b67319d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -414,8 +414,17 @@ int paravirt_disable_iospace(void);
                                "=c" (__ecx)
 #define PVOP_CALL_CLOBBERS     PVOP_VCALL_CLOBBERS, "=a" (__eax)

-/* void functions are still allowed [re]ax for scratch */
+/*
+ * void functions are still allowed [re]ax for scratch.
+ *
+ * The ZERO_CALL_USED REGS feature may end up zeroing out callee-saved
+ * registers. Make sure we model this with the appropriate clobbers.
+ */
+#ifdef CONFIG_ZERO_CALL_USED_REGS
+#define PVOP_VCALLEE_CLOBBERS  "=a" (__eax), PVOP_VCALL_CLOBBERS
+#else
 #define PVOP_VCALLEE_CLOBBERS  "=a" (__eax)
+#endif
 #define PVOP_CALLEE_CLOBBERS   PVOP_VCALLEE_CLOBBERS

 #define EXTRA_CLOBBERS  , "r8", "r9", "r10", "r11"

With that:

Reported-and-tested-by: Nathan Chancellor <nathan@kernel.org>

bwendling commented 2 years ago

Done. Thank you!

bwendling commented 2 years ago

Sent in a patch for review. I expect some pushback, because in the case where the trivial alternative is chosen it will have a superfluous move of RDI into another register. I don't know of quite how to remove that, since it's a direct consequence of using asm blocks instead of allowing the compiler to do its thing. In the case of alternative blocks that's a valid use, but it does result in the superfluous instructions. (note it's only if the value is used after the block. If it's dead, then it doesn't emit anything.)

https://lkml.org/lkml/2022/9/2/1392

nickdesaulniers commented 2 years ago

Was this just fixed with: https://reviews.llvm.org/D133946 https://github.com/llvm/llvm-project/issues/57692?

nathanchance commented 2 years ago

No, I can still reproduce this issue even with that patch.

bwendling commented 2 years ago

Approved here: https://lore.kernel.org/lkml/fa6df43b-8a1a-8ad1-0236-94d2a0b588fa@suse.com/

bwendling commented 1 year ago

In upstream: https://lore.kernel.org/r/20220902213750.1124421-3-morbo@google.com