Add UD1 masks to `CONFIG_UBSAN_TRAP`

kees commented 1 year ago

Right now, GCC uses bare trap (UD2 on x86, brk on arm64) instruction for CONFIG_UBSAN_TRAP. Clang uses a trap with an encoded immediate (UD1 on x86, brk64 on arm64) that uses a bit field to encode information about which UBSAN check caused the trap. (For example, see commit 25b84002afb9dc9a91a7ea67166879c13ad82422.) The arm64 trap handler parses this immediate to give more information during a UBSAN trap, but this parsing is missing on x86: we only generate very non-obvious "invalid opcode" reports.

[ ] GCC should emit a UD1 instruction with an encoded immediate.
[ ] x86 needs to parse the Clang UD1 immediate like is done on arm64.

In the meantime, we could update the crash hint text in the kernel to mention UBSAN for "invalid opcode".

kees commented 2 months ago

FYI arm64 trap decoding happens in lib/ubsan.c's report_ubsan_failure() function.

kees commented 2 months ago

If, instead, we wanted to go the route of emitting a list of trap addresses in a specially named section, look at .kcfi_traps handling in Linux and LLVM. For example, see how is_cfi_trap() gets wired up: https://elixir.bootlin.com/linux/v6.8/source/kernel/cfi.c#L94

To test UBSAN, the easiest is probably using CONFIG_LKDTM=y in the kernel. If one builds with CONFIG_UBSAN_BOUNDS=y (which requires CONFIG_UBSAN=y), and without CONFIG_UBSAN_TRAP, you'll get useful kernel Oopses.

To see the crash tests available for LKDTM, run: cat /sys/kernel/debug/provoke_crash/DIRECT

For example, to exercise the array bounds mitigation, run: echo ARRAY_BOUNDS | cat >/sys/kernel/debug/provoke_crash/DIRECT

FYI, the pipe to cat above is used because the kernel kills the process on an Oops. If you just did echo ... >/path/... it would kill your shell, which is very annoying. ;)

Without CONFIG_UBSAN_TRAP, the above should cause an Oops that starts with UBSAN: array-index-out-of-bounds .... If you've built with CONFIG_UBSAN_TRAP=y, then the x86 kernel will have no idea what happened. Seeing that unhelpful error is left as an exercise for the reader. ;)

Yusong-Gao commented 1 week ago

FYI arm64 trap decoding happens in lib/ubsan.c's report_ubsan_failure() function.

Hi, kees. There has a question. I want to know why the patch v2 remove the ubsan_cfi_check_fail support:

I meet a similar problem:

Thanks.

kees commented 1 week ago

FYI arm64 trap decoding happens in lib/ubsan.c's report_ubsan_failure() function.

Hi, kees. There has a question. I want to know why the patch v2 remove the ubsan_cfi_check_fail support:

I meet a similar problem

I'm not sure I follow your question. ubsan_cfi_check_fail was not removed.

kees commented 1 week ago

Adding @gatlinnewhouse to CC, since he's working on supporting this: https://lore.kernel.org/lkml/20240625032509.4155839-1-gatlin.newhouse@gmail.com/

Yusong-Gao commented 6 days ago

FYI arm64 trap decoding happens in lib/ubsan.c's report_ubsan_failure() function.

Hi, kees. There has a question. I want to know why the patch v2 remove the ubsan_cfi_check_fail support: I meet a similar problem

I'm not sure I follow your question. ubsan_cfi_check_fail was not removed.

Thanks for your reply.

In arm64: early_brk64() -> ubsan_handler() -> report_ubsan_failure(). The report_ubsan_failure() check the type error except the ubsan_cfi_check_fail. I think report_ubsan_failure() should add the case ubsan_cfi_check_fail from the error type brk #0x5502.(0x5500 meaning it's a UB error and 0x2 meaning type error is ubsan_cfi_check_fail)

kees commented 6 days ago

In arm64: early_brk64() -> ubsan_handler() -> report_ubsan_failure(). The report_ubsan_failure() check the type error except the ubsan_cfi_check_fail. I think report_ubsan_failure() should add the case ubsan_cfi_check_fail from the error type brk #0x5502.(0x5500 meaning it's a UB error and 0x2 meaning type error is ubsan_cfi_check_fail)

UBSAN is 0x55xx. KCFI is 0x8xxx:

#define UBSAN_BRK_IMM                   0x5500
...
#define CFI_BRK_IMM_BASE                0x8000

UBSAN has a ubsan_cfi_check_fail (0x02) enum but it shouldn't be getting used. Perhaps you're looking at a very old kernel with the deprecated Clang CFI (i.e. not the modern KCFI)?

If you're seeing a 0x5502 with the latest kernel, please open a new bug -- this bug is about x86 UD1 handling for UBSAN. :)

KSPP / linux

Add UD1 masks to `CONFIG_UBSAN_TRAP` #328