anthraxx / linux-hardened

Minimal supplement to upstream Kernel Self Protection Project changes. Features already provided by SELinux + Yama and archs other than multiarch arm64 / x86_64 aren't in scope. Only tags have stable history. Shared IRC channel with KSPP: irc.libera.chat #linux-hardening
Other
573 stars 56 forks source link

PAGE_SANITIZE_VERIFY conflicts with KASAN #72

Open tsautereau-anssi opened 2 years ago

tsautereau-anssi commented 2 years ago

PAGE_SANITIZE_VERIFY as currently implemented in linux-hardened does not handle KASAN correctly. It misses a call to kasan_disable_current() (resp. kasan_enable_current()) right before (resp. after) the call to memchr_inv() in verify_zero_highpage() because we are reading memory that is still poisoned by KASAN. In addition and for the same reason, the virtual kernel address passed to memchr_inv() must be untagged beforehand via a call to kasan_reset_tag().

I noticed this as I was rebasing linux-hardened onto v5.18, after reviewing several changes made to KASAN-related code in post_alloc_hook(), which made me wonder why KASAN wasn't complaining about our use of memchr_inv() on poisoned pages. Long story short, it turns out that KASAN instrumentation of lib/string.c (where memchr_inv() is defined) is simply turned off when CONFIG_AMD_MEM_ENCRYPT is enabled, which is actually the case for Arch Linux's linux-hardened package as well as for the numerous test builds that my colleague @nbouchinet-anssi was generously doing to help us verify our various hypotheses.

Note that people trying to run a linux-hardened kernel on AArch64 with the default config plus hardware tag-based KASAN (which requires ARMv8.5 MTE), if any, would have encountered an error regardless of their CONFIG_AMD_MEM_ENCRYPT setting since this KASAN mode does not use compiler instrumentation to insert validity checks.

genbtc commented 10 months ago

Ive been having my own issue for quite some time now, and I finally decided to track it down and I arrived here. OP seems correct. But there is more to the story, as the verify function is bugged during this sequence. My computer works perfectly fine except every 3-4 weeks one of these segfaults will take down the machine. I have captured it with Pstore & Netconsole, Always the same code path. prep_new_page calls verify_zero_highpage and hits BUG_ON memchr_inv kaddr 0. It seems almost like the page it just got is invalid. Please advise. I don't have a reproducer but it eventually triggers. Im willing to recompile some debug code in to pin it down.

Oops#1 Part2
<7>[932535.670960] RAX: ffffa053ed0f70c9 RBX: ffffd59017b43dc0 RCX: ffffa053ed0f70c9
<7>[932535.670963] RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
<7>[932535.670965] RBP: ffffd59017b43dc0 R08: 0101010101010101 R09: 0000000000000080
<7>[932535.670967] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
<7>[932535.670969] R13: 0000000000000001 R14: ffffa04f706f5a00 R15: ffffd59017b43e00
<7>[932535.670972] FS:  00006fc01d878bc0(0000) GS:ffffa0559fb40000(0000) knlGS:0000000000000000
<7>[932535.670975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<7>[932535.670977] CR2: 00006fbfe932a000 CR3: 0000000366b2a000 CR4: 0000000000750ee0
<7>[932535.670979] PKRU: 55555554
<7>[932535.670981] Call Trace:
<7>[932535.670987]  ? __die_body.cold+0x1a/0x1f
<7>[932535.670991]  ? die+0x2a/0x50
<7>[932535.670995]  ? do_trap+0x83/0x100
<7>[932535.670998]  ? do_error_trap+0x65/0x80
<7>[932535.671000]  ? prep_new_page+0xf6/0x150
<7>[932535.671005]  ? exc_invalid_op+0x49/0x60
<7>[932535.671007]  ? prep_new_page+0xf6/0x150
<7>[932535.671010]  ? asm_exc_invalid_op+0x12/0x20
<7>[932535.671014]  ? prep_new_page+0xf6/0x150
<7>[932535.671017]  get_page_from_freelist+0xa45/0x1970
<7>[932535.671022]  __alloc_pages_nodemask+0x156/0x2f0
<7>[932535.671026]  handle_mm_fault+0x57b/0x14e0
<7>[932535.671031]  do_user_addr_fault+0x166/0x3a0
<7>[932535.671034]  exc_page_fault+0x78/0x160
<7>[932535.671038]  ? asm_exc_page_fault+0x8/0x30
<7>[932535.671097]  asm_exc_page_fault+0x1e/0x30
<7>[932535.671100] RIP: 0033:0x6fc01daf46a4
<7>[932535.671169] Code: 00 0f 1f 44 00 00 c5 fe 6f 4e 20 f7 c1 00 0e 00 00 75 65 49 89 c9 48 8d 4c 16 ff 48 83 ce 3f 4a 8d 7c 0e 01 48 29 f1 48 ff c6 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 66 66 2e 0f 1f
Oops#1 Part3
<7>[932535.670915] ------------[ cut here ]------------
<2>[932535.670929] kernel BUG at include/linux/highmem.h:290!
<7>[932535.670937] invalid opcode: 0000 [#1] SMP NOPTI
<7>[932535.670941] CPU: 5 PID: 6065 Comm: Isolated Web Co Tainted: G           O    T 5.10.202-gentoo-hardened1-ZEN3iGPU-REV10 #213
<7>[932535.670944] Hardware name: ASRock X570 Phantom Gaming 4, BIOS P4.30 02/23/2022
<7>[932535.670953] Code: 48 89 df 48 2b 3d 7a d5 d3 00 31 f6 ba 00 10 00 00 48 c1 ff 06 48 c1 e7 0c 48 03 3d 74 d5 d3 00 e8 bf f9 2a 00 48 85 c0 74 bf <0f> 0b e9 23 00 00 00 e9 3c 00 00 00 f7 44 24 04 00 01 00 00 0f 84
<7>[932535.670957] RSP: 0000:ffffb1d803853c50 EFLAGS: 00010286