ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

CFI failure in `blake2s_update()` #1567

Closed nathanchance closed 2 years ago

nathanchance commented 2 years ago

Initially reported by CI on android-mainline: https://github.com/ClangBuiltLinux/continuous-integration2/runs/4824153007?check_suite_focus=true

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 distclean defconfig

$ scripts/config -d LTO_NONE -e LTO_CLANG_FULL -e CFI_CLANG

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig Image.gz

$ boot-qemu.sh -a arm64 -k . -t 30s
...
[    0.000000] Linux version 5.16.0-rc8-00056-g9f9eff85a008 (nathan@archlinux-ax161) (ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 1441ffe6a6da90e9c4800914eb0005b1087aa5d2), LLD 14.0.0) #1 SMP PREEMPT Sat Jan
 15 19:17:16 UTC 2022
...
[    0.000000] Kernel panic - not syncing: CFI failure (target: blake2s_compress+0x0/0x144c)
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc8-00056-g9f9eff85a008 #1
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] Call trace:
[    0.000000]  dump_backtrace+0x0/0x8
[    0.000000]  dump_stack_lvl+0x7c/0xb0
[    0.000000]  panic+0x174/0x444
[    0.000000]  __cfi_check_fail+0x50/0x54
[    0.000000]  __cfi_slowpath_diag+0x1a0/0x240
[    0.000000]  blake2s_update+0x13c/0x168
[    0.000000]  _extract_entropy+0x158/0x400
[    0.000000]  crng_initialize_primary+0x4c/0xc0
[    0.000000]  rand_initialize+0x14/0x50
[    0.000000]  start_kernel+0x2fc/0x6ac
[    0.000000]  __primary_switched+0xc0/0x7f3c
[    0.000000] ---[ end Kernel panic - not syncing: CFI failure (target: blake2s_compress+0x0/0x144c) ]---

This happens after commit 9f9eff85a008 ("random: use BLAKE2s instead of SHA1 in extraction") (and I assume its prerequisite, commit 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in"), has something to do with this as well).

It does not happen with ThinLTO, which is peculiar (and explains how our CI did not see it in the regular mainline builds, as we test CFI with ThinLTO).

blake2s_update() is a wrapper for __blake2s_update(), which calls blake2s_compress() as the fourth parameter, which should be of type blake2s_compress_t.

lib/crypto/blake2s.c:

void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen)
{
    __blake2s_update(state, in, inlen, blake2s_compress);
}
EXPORT_SYMBOL(blake2s_update);

lib/crypto/blake2s-generic.c

void blake2s_compress(struct blake2s_state *state, const u8 *block,
              size_t nblocks, const u32 inc)
              __weak __alias(blake2s_compress_generic);

void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
                  size_t nblocks, const u32 inc)
{
...

include/crypto/internal/blake2s.h:

void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
                  size_t nblocks, const u32 inc);

void blake2s_compress(struct blake2s_state *state, const u8 *block,
              size_t nblocks, const u32 inc);

...

typedef void (*blake2s_compress_t)(struct blake2s_state *state,
                   const u8 *block, size_t nblocks, u32 inc);

...

static inline void __blake2s_update(struct blake2s_state *state,
                    const u8 *in, size_t inlen,
                    blake2s_compress_t compress)
{
    const size_t fill = BLAKE2S_BLOCK_SIZE - state->buflen;

    if (unlikely(!inlen))
        return;
    if (inlen > fill) {
        memcpy(state->buf + state->buflen, in, fill);
        (*compress)(state, state->buf, 1, BLAKE2S_BLOCK_SIZE);
        state->buflen = 0;
        in += fill;
        inlen -= fill;
    }
    if (inlen > BLAKE2S_BLOCK_SIZE) {
        const size_t nblocks = DIV_ROUND_UP(inlen, BLAKE2S_BLOCK_SIZE);
        /* Hash one less (full) block than strictly possible */
        (*compress)(state, in, nblocks - 1, BLAKE2S_BLOCK_SIZE);
        in += BLAKE2S_BLOCK_SIZE * (nblocks - 1);
        inlen -= BLAKE2S_BLOCK_SIZE * (nblocks - 1);
    }
    memcpy(state->buf + state->buflen, in, inlen);
    state->buflen += inlen;
}

I tried changing the fourth parameter of blake2s_compress_t from u32 inc to const u32 inc but that did not make a difference:

diff --git a/include/crypto/internal/blake2s.h b/include/crypto/internal/blake2s.h
index d39cfa0d333e..bb7438b9cc10 100644
--- a/include/crypto/internal/blake2s.h
+++ b/include/crypto/internal/blake2s.h
@@ -25,7 +25,7 @@ static inline void blake2s_set_lastblock(struct blake2s_state *state)
 }

 typedef void (*blake2s_compress_t)(struct blake2s_state *state,
-                                  const u8 *block, size_t nblocks, u32 inc);
+                                  const u8 *block, size_t nblocks, const u32 inc);

 /* Helper functions for BLAKE2s shared by the library and shash APIs */

Could it be something going wrong with the __weak __alias(blake2s_compress_generic) part of blake2s_compress()?

cc @samitolvanen

samitolvanen commented 2 years ago

LLVM has some known issues with weak aliases and CFI, which we have in the past worked around by changing the code to use a weak function instead.

nathanchance commented 2 years ago

Something like this appears to work for me. This is more consistent with the rest of the kernel, I do not see any other uses of __weak __alias(...).

diff --git a/lib/crypto/blake2s-generic.c b/lib/crypto/blake2s-generic.c
index 75ccb3e633e6..6ced67378d1d 100644
--- a/lib/crypto/blake2s-generic.c
+++ b/lib/crypto/blake2s-generic.c
@@ -37,9 +37,11 @@ static inline void blake2s_increment_counter(struct blake2s_state *state,
        state->t[1] += (state->t[0] < inc);
 }

-void blake2s_compress(struct blake2s_state *state, const u8 *block,
-                     size_t nblocks, const u32 inc)
-                     __weak __alias(blake2s_compress_generic);
+void __weak blake2s_compress(struct blake2s_state *state, const u8 *block,
+                            size_t nblocks, const u32 inc)
+{
+       return blake2s_compress_generic(state, block, nblocks, inc);
+}

 void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
                              size_t nblocks, const u32 inc)
nathanchance commented 2 years ago

Pending patch: https://git.kernel.org/nathan/c/9fc1d6db9fa21034b927fdd5d16f1c887fdad2a6

@samitolvanen Do you have any reference to the issues with weak aliases and CFI? I see #538 (and its upstream report https://github.com/llvm/llvm-project/issues/41887) but that was with Thin LTO and it seems like https://github.com/llvm/llvm-project/commit/ea314fd476166405905103a29dd8578ff6589160 was the fix for that (and the LLVM issue notes that full LTO was fine). I am just worried upstream is going to respond with "fix the compiler".

zx2c4 commented 2 years ago

Alternative approach: https://lore.kernel.org/linux-crypto/20220119135450.564115-1-Jason@zx2c4.com/

Seems like we'd be much better off just fixing Clang's broken CFI though?

samitolvanen commented 2 years ago

Seems like we'd be much better off just fixing Clang's broken CFI though?

I agree, but I suspect that might take a while. Is this issue trivially reproducible in a stand-alone program? If so, Nathan, would you mind filing a bug with upstream LLVM?

zx2c4 commented 2 years ago

I agree, but I suspect that might take a while.

If you'd like to have an impact on this, please reply to my email on LKML:

https://lore.kernel.org/lkml/CAHmME9ow7TxCaYYayRn9rdJJSdQ48tWQgdrW00g7mHaWvVJ+Zw@mail.gmail.com/

nathanchance commented 2 years ago

I agree, but I suspect that might take a while. Is this issue trivially reproducible in a stand-alone program? If so, Nathan, would you mind filing a bug with upstream LLVM?

I tried extracting the problematic functions and creating a small reproducer on Wednesday morning but I did not have any luck (there was no CFI failure). I am working through other things in my backlog today, I will try and double back around to this on Monday.

zx2c4 commented 2 years ago

Dunno how to reproduce standalone either, but here's a place to start:

alias.c

int a(void *v) __attribute__((weak)) __attribute__((__alias__("b")));
int b(void *v)
{
        return 8 + (unsigned long)v;
}

call.c:

int call(int(*f)(void*), void *v)
{
        return f(v);
}

main.c

int call(int(*f)(void*), void *v);
int a(void *);

int main(int argc, char *argv[])
{
        return call(a, argv);
}
nickdesaulniers commented 2 years ago

kernel v2 patch

zx2c4 commented 2 years ago

@nathanchance Any luck making a minimal non-kernel reproducer for this? Once my patch goes upstream, it may sometime after becoming annoying to revert, or hard to remember where to revert to, making this bug harder to fix properly eventually in llvm. So I'd recommend we get that worked out sooner rather than later.

I futzed a bit with the code in https://github.com/ClangBuiltLinux/linux/issues/1567#issuecomment-1018884115 to no avail. I tried this using -flto=full -fvisibility={?} -fuse-ld=lld -fsanitize=cfi, with various options for {?}. I also tried a slight variation on this to more closely mirror what's happening in the kernel code:

main.c

#include "call.h"
int a(void *);

int main(int argc, char *argv[])
{
        return call(a, argv);
}

call.h

static inline int call(int(*f)(void*), void *v)
{
        return f(v);
}

alias.c

int a(void *v) __attribute__((weak)) __attribute__((__alias__("b")));
int b(void *v)
{
        return 8 + (unsigned long)v;
}

But this too was not the magic bullet.

Any suggestions?

nathanchance commented 2 years ago

I have not been able to come up with anything yet. I tried taking the kernel code almost verbatim (minus the actual content of functions) and that did not work.

I did notice that with Jason's example, there is no symbol for the weak alias in the symbol table but there is with the kernel:

$ clang -flto -fsanitize=cfi -fsanitize-cfi-cross-dso -fno-sanitize-cfi-canonical-jump-tables -fno-sanitize-trap=cfi -fno-sanitize-blacklist -fuse-ld=lld -fvisibility=hidden -o test main.c alias.c

$ llvm-readelf -s test &| rg " [a|b](.cfi_jt)*\$"
  4519: 0000000000057034    24 FUNC    LOCAL  DEFAULT    14 b
  4525: 000000000005704c     4 FUNC    LOCAL  DEFAULT    14 b.cfi_jt

$ llvm-readelf -s $CBL_SRC/linux/vmlinux &| rg " blake2s_compress"
753424: ffff80000980a310     8 FUNC    LOCAL  DEFAULT     2 blake2s_compress_generic.cfi_jt
801095: ffff8000087b0bec  5196 FUNC    LOCAL  HIDDEN      2 blake2s_compress
801098: ffff8000087b0bec  5196 FUNC    LOCAL  HIDDEN      2 blake2s_compress_generic

I'll have to see if I can get the reproducer to have that same behavior, as that will likely trigger the bug.

zx2c4 commented 2 years ago

minus the actual content of functions

I wonder if this is related somehow. If the functions are too small or only called from one place, shouldn't LTO in theory attempt to inline it? On the other hand, I'd expect the presence of an indirect call to prevent the inlining?

nathanchance commented 2 years ago

I wonder if this is related somehow.

It likely is.

We do limit the amount of inlining that happens across translation units with LTO in the main Makefile: https://git.kernel.org/linus/22d429e75f24d114d99223389d6ba7047e952e32

Unfortunately, removing that does nothing.

I will see if I can just extract all parts of the call chain tomorrow.

nathanchance commented 2 years ago

I took a whack at trying to get all pieces of the call chain into a standalone reproducer but I got lost trying to detangle drivers/char/random.c. I tried playing around with the simpler reproducer but could not get it to reproduce the same issue. This issue likely only shows up with multiple translation units and certain function sizes that affects inlining.

For what it's worth, if maintainability of this workaround is a concern, my workaround patch is way simpler, which would make it easier to revert. Additionally, it appears that Sami is working on a kernel-specific CFI implementation, which presumably will not have this issue, but I assume there is not a timeline for when that will be ready.

zx2c4 commented 2 years ago

my workaround patch is way simpler

But it introduces an additional jump in the non LTO case, I think. So I'll move ahead with my patch, and hopefully "5.17-rc1" is a good enough bookmark for this whenever the LTO people want to try to fix it.

zx2c4 commented 2 years ago

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba6ef8af0f3be2e0318aac7e6e43d3a04a069a6a

zx2c4 commented 2 years ago

Low priority?

nathanchance commented 2 years ago

Low priority?

I removed the "low priority" from the label.

samitolvanen commented 2 years ago

Can we close this now that upstream no longer uses Clang CFI?

nathanchance commented 2 years ago

Yes, I think it can be.