anthraxx / linux-hardened

Minimal supplement to upstream Kernel Self Protection Project changes. Features already provided by SELinux + Yama and archs other than multiarch arm64 / x86_64 aren't in scope. Only tags have stable history. Shared IRC channel with KSPP: irc.libera.chat #linux-hardening
Other
567 stars 56 forks source link

PAGE_SANITIZE_VERIFY conflicts with KASAN #72

Open tsautereau-anssi opened 2 years ago

tsautereau-anssi commented 2 years ago

PAGE_SANITIZE_VERIFY as currently implemented in linux-hardened does not handle KASAN correctly. It misses a call to kasan_disable_current() (resp. kasan_enable_current()) right before (resp. after) the call to memchr_inv() in verify_zero_highpage() because we are reading memory that is still poisoned by KASAN. In addition and for the same reason, the virtual kernel address passed to memchr_inv() must be untagged beforehand via a call to kasan_reset_tag().

I noticed this as I was rebasing linux-hardened onto v5.18, after reviewing several changes made to KASAN-related code in post_alloc_hook(), which made me wonder why KASAN wasn't complaining about our use of memchr_inv() on poisoned pages. Long story short, it turns out that KASAN instrumentation of lib/string.c (where memchr_inv() is defined) is simply turned off when CONFIG_AMD_MEM_ENCRYPT is enabled, which is actually the case for Arch Linux's linux-hardened package as well as for the numerous test builds that my colleague @nbouchinet-anssi was generously doing to help us verify our various hypotheses.

Note that people trying to run a linux-hardened kernel on AArch64 with the default config plus hardware tag-based KASAN (which requires ARMv8.5 MTE), if any, would have encountered an error regardless of their CONFIG_AMD_MEM_ENCRYPT setting since this KASAN mode does not use compiler instrumentation to insert validity checks.

genbtc commented 8 months ago

Ive been having my own issue for quite some time now, and I finally decided to track it down and I arrived here. OP seems correct. But there is more to the story, as the verify function is bugged during this sequence. My computer works perfectly fine except every 3-4 weeks one of these segfaults will take down the machine. I have captured it with Pstore & Netconsole, Always the same code path. prep_new_page calls verify_zero_highpage and hits BUG_ON memchr_inv kaddr 0. It seems almost like the page it just got is invalid. Please advise. I don't have a reproducer but it eventually triggers. Im willing to recompile some debug code in to pin it down.

Oops#1 Part2
<7>[932535.670960] RAX: ffffa053ed0f70c9 RBX: ffffd59017b43dc0 RCX: ffffa053ed0f70c9
<7>[932535.670963] RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
<7>[932535.670965] RBP: ffffd59017b43dc0 R08: 0101010101010101 R09: 0000000000000080
<7>[932535.670967] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
<7>[932535.670969] R13: 0000000000000001 R14: ffffa04f706f5a00 R15: ffffd59017b43e00
<7>[932535.670972] FS:  00006fc01d878bc0(0000) GS:ffffa0559fb40000(0000) knlGS:0000000000000000
<7>[932535.670975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<7>[932535.670977] CR2: 00006fbfe932a000 CR3: 0000000366b2a000 CR4: 0000000000750ee0
<7>[932535.670979] PKRU: 55555554
<7>[932535.670981] Call Trace:
<7>[932535.670987]  ? __die_body.cold+0x1a/0x1f
<7>[932535.670991]  ? die+0x2a/0x50
<7>[932535.670995]  ? do_trap+0x83/0x100
<7>[932535.670998]  ? do_error_trap+0x65/0x80
<7>[932535.671000]  ? prep_new_page+0xf6/0x150
<7>[932535.671005]  ? exc_invalid_op+0x49/0x60
<7>[932535.671007]  ? prep_new_page+0xf6/0x150
<7>[932535.671010]  ? asm_exc_invalid_op+0x12/0x20
<7>[932535.671014]  ? prep_new_page+0xf6/0x150
<7>[932535.671017]  get_page_from_freelist+0xa45/0x1970
<7>[932535.671022]  __alloc_pages_nodemask+0x156/0x2f0
<7>[932535.671026]  handle_mm_fault+0x57b/0x14e0
<7>[932535.671031]  do_user_addr_fault+0x166/0x3a0
<7>[932535.671034]  exc_page_fault+0x78/0x160
<7>[932535.671038]  ? asm_exc_page_fault+0x8/0x30
<7>[932535.671097]  asm_exc_page_fault+0x1e/0x30
<7>[932535.671100] RIP: 0033:0x6fc01daf46a4
<7>[932535.671169] Code: 00 0f 1f 44 00 00 c5 fe 6f 4e 20 f7 c1 00 0e 00 00 75 65 49 89 c9 48 8d 4c 16 ff 48 83 ce 3f 4a 8d 7c 0e 01 48 29 f1 48 ff c6 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 66 66 2e 0f 1f
Oops#1 Part3
<7>[932535.670915] ------------[ cut here ]------------
<2>[932535.670929] kernel BUG at include/linux/highmem.h:290!
<7>[932535.670937] invalid opcode: 0000 [#1] SMP NOPTI
<7>[932535.670941] CPU: 5 PID: 6065 Comm: Isolated Web Co Tainted: G           O    T 5.10.202-gentoo-hardened1-ZEN3iGPU-REV10 #213
<7>[932535.670944] Hardware name: ASRock X570 Phantom Gaming 4, BIOS P4.30 02/23/2022
<7>[932535.670953] Code: 48 89 df 48 2b 3d 7a d5 d3 00 31 f6 ba 00 10 00 00 48 c1 ff 06 48 c1 e7 0c 48 03 3d 74 d5 d3 00 e8 bf f9 2a 00 48 85 c0 74 bf <0f> 0b e9 23 00 00 00 e9 3c 00 00 00 f7 44 24 04 00 01 00 00 0f 84
<7>[932535.670957] RSP: 0000:ffffb1d803853c50 EFLAGS: 00010286