ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

CFI: Drop !kcfi_type metadata for non-address-taken globals with LTO #1737

Open samitolvanen opened 2 years ago

samitolvanen commented 2 years ago

Another feature suggestion from @lvwr:

When LTO is used, hashes may be suppressed for non-local + non-address taken functions.

This would work similarly to ibt-seal. Basically, with LTO we can tell for sure that non-address-taken globals are not indirectly called, which means we can drop the !kcfi_type metadata from them, thus making these functions invalid indirect call targets.

cc @kees

nickdesaulniers commented 2 years ago

That's nice; it sounds like LTO will no longer be strictly necessary for kCFI, but will still improve kCFI when used.

samitolvanen commented 2 years ago

Here's a quick draft: https://github.com/samitolvanen/llvm-project/commit/0d3cc88cc69ea859773f95a2b2e8ee1adfd4b6c8

With x86_64 defconfig + LTO_CLANG_FULL + CFI_CLANG, the number of indirectly callable functions dropped from 46647 to 34712 (-11935 = ~-25.6%) and the kernel booted just fine in my tests.

However, with LTO_CLANG_THIN, the kcfi-seal attribute resulted in the type hash getting dropped from another 1000+ functions, and I ended up with a kernel that failed to boot. The main issue appears to be that Function::hasAddressTaken() returns false for some address-taken globals (e.g. e820__memory_setup_default, which leads to an early boot crash).

@lvwr Do you remember running into similar issues with ThinLTO when you implemented ibt-seal? I'm not sure if ThinLTO even keeps track of address-taken functions across TUs.

samitolvanen commented 2 years ago

I confirmed that ThinLTO doesn't know about cross-TU address taking, which means -mibt-seal also doesn't work correctly. Here's a reproducer:

$ cat thinlto1.c 
extern void g(void);
void *f(void) { return &g; }
$ cat thinlto2.c 
void g(void) {}

Which I compiled with -flto=thin and IBT using the kernel code model, and then ensured Clang set the ibt-seal flag:

$ clang -c -flto=thin -fcf-protection=branch -mibt-seal -mcmodel=kernel thinlto?.c
$ llvm-dis -o - thinlto1.o | grep ibt-seal
!2 = !{i32 8, !"ibt-seal", i32 1}
$ llvm-dis -o - thinlto2.o | grep ibt-seal
!2 = !{i32 8, !"ibt-seal", i32 1}

When linked into a binary, we have no endbr64 instructions:

$ ld.lld -r -o thinlto.o thinlto?.o
$ objdump -d -r thinlto.o

thinlto.o:     file format elf64-x86-64

Disassembly of section .text.f:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # b <f+0xb>
                        7: R_X86_64_PC32        g-0x4
   b:   5d                      pop    %rbp
   c:   c3                      ret

Disassembly of section .text.g:

0000000000000000 <g>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   5d                      pop    %rbp
   5:   c3                      ret

Since the address of g was taken in the other TU, the function should have an endbr64. However, when I compile the reproducer with -flto instead of -flto=thin, ibt-seal works correctly again:

$ objdump -d -r fulllto.o

fulllto.o:     file format elf64-x86-64

Disassembly of section .text.f:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # b <f+0xb>
                        7: R_X86_64_PC32        g-0x4
   b:   5d                      pop    %rbp
   c:   c3                      ret

Disassembly of section .text.g:

0000000000000000 <g>:
   0:   f3 0f 1e fa             endbr64
   4:   55                      push   %rbp
   5:   48 89 e5                mov    %rsp,%rbp
   8:   5d                      pop  

So, it looks like the *-seal optimizations are only going to work with full LTO.

nickdesaulniers commented 2 years ago

ThinLTO doesn't know about cross-TU address taking

I asked Teresa Johnson about that, her thoughts:

ThinLTO might need to be taught something to propagate the necessary info. There should be a ref edge from f -> g in the index If this optimization is only IR scoped it might need to have a ThinLTO version to propagate the cross-TU ref info to the referenced func symbol for the ThinLTO backend to manage. But yes, if that is an optimization that relies on seeing references to functions only on IR, it sounds like an accurate statement that ThinLTO will not work automatically The ref info is there in the index (it should be), so just need a flag or something to propagate the info back and make the optimization look at the summary too, or something to that effect.

samitolvanen commented 2 years ago

Yes, that makes perfect sense, thanks for checking with Teresa. I'll disable kcfi-seal with ThinLTO initially and we can later see how much work it would be to make this work with ThinLTO as well.

samitolvanen commented 2 years ago

Testing this a bit more, it looks like this optimization doesn't play nicely with arm64's alternative_cb where the callback functions are defined in stand-alone assembly, but the indirect calls to them in __apply_alternatives have a KCFI check. I'll have to think of a reasonable solution here. At least I didn't run into any issues on x86_64... yet!

samitolvanen commented 1 year ago

Adding __used to the affected functions fixes the arm64 issue, but I decided it's probably best to add the compiled feature behind a codegen option nevertheless so it can be disabled for specific translation units if needed -- just like ibt-seal:

https://github.com/samitolvanen/llvm-project/commits/kcfi-seal

I'll try to clean this up this week, do some more testing, and upload a patch to Phabricator for review.

samitolvanen commented 1 year ago

https://reviews.llvm.org/D138337

samitolvanen commented 1 year ago

Based on feedback from @pcc, I hacked together a version of the patch that doesn't drop type information for functions with VisibleToRegularObj set in symbol resolution. While this definitely does solve the issue with functions that are only address-taken in assembly but called indirectly in C, it also results in almost no types getting dropped from vmlinux.

Here are updated numbers with ToT Linux + the latest Clang: __cfi_ symbols Configuration
46642 x86_64 defconfig without kcfi-seal
46610 With kcfi-seal, but VisibleToRegularObj not dropped
34513 With kcfi-seal

It seems to me like the only feasible path forward is to keep this behind a flag so we can disable it if needed, or sprinkle the relevant kernel functions with __used to make sure we don't drop their type information with LTO. Thoughts?

pcc commented 1 year ago

Do you understand why VisibleToRegularObj is being set on most of the symbols?

samitolvanen commented 1 year ago

Do you understand why VisibleToRegularObj is being set on most of the symbols?

Looking at the code in LLD, it's because the kernel is doing a relocatable link. What do you think about also adding isUsedInRegularObj to SymbolResolution. I didn't test that yet, but it looks like that would be a more useful signal here.

pcc commented 1 year ago

The problem with relocatable links is that basically every symbol must be pessimistically considered to be used, because the linker doesn't know which symbols will be used in the final link. If the final link takes the address of one of those symbols, that would cause a problem. So I don't think that exposing isUsedInRegularObj would help here.

What might be useful would be if it were possible to filter the symbols exported from the object file when the relocatable link occurs. This would also have benefits for regular LTO because the compiler would be able to optimize more aggressively (e.g. if A calls B and B has no other calls, it would almost always be beneficial to inline B into A because it would not be necessary to emit another copy of B). This could use the existing version script syntax to specify the list of symbols.

samitolvanen commented 1 year ago

So I don't think that exposing isUsedInRegularObj would help here.

You're correct, that didn't look useful after all.

What might be useful would be if it were possible to filter the symbols exported from the object file when the relocatable link occurs. This would also have benefits for regular LTO because the compiler would be able to optimize more aggressively (e.g. if A calls B and B has no other calls, it would almost always be beneficial to inline B into A because it would not be necessary to emit another copy of B). This could use the existing version script syntax to specify the list of symbols.

Sure, I can see how that could benefit LTO, but this requires coming up with a list of symbols. Using the module exports wouldn't be sufficient as none of the functions used with alternative_cb are exported, for example. Perhaps we'd have to additionally go through all the non-bitcode files included in the relocatable link and check for undefined symbols? Or did you have some clever idea in mind for this?

pcc commented 1 year ago

Perhaps we'd have to additionally go through all the non-bitcode files included in the relocatable link and check for undefined symbols?

That's what I had in mind, although I don't think this would be an additional step because isUsedInRegularObj is already being set if we see an undefined symbol in a non-bitcode file in order to implement the same thing for non-relocatable links. The condition for setting VisibleToRegularObj in lld would become something like

    r.VisibleToRegularObj = sym->isUsedInRegularObj ||
                            sym->referencedAfterWrap ||
                            (r.Prevailing && (config->relocatable ? (sym->computeBinding() != STB_LOCAL) : sym->includeInDynsym() )) ||
                            usedStartStop.count(objSym.getSectionName());

That would be after changing lld so that computeBinding() is influenced by version scripts in relocatable links.

samitolvanen commented 1 year ago

That's what I had in mind, although I don't think this would be an additional step because isUsedInRegularObj is already being set if we see an undefined symbol in a non-bitcode file in order to implement the same thing for non-relocatable links.

Hmm, so exposing isUsedInRegularObj to LTO and not dropping types for symbols where it's set should also fix the alternative_cb issue in that case. I did some testing and it looks like it fixes most of the problem, but I'm still seeing types getting dropped in the arm64 KVM code because of the funky binary editing they're doing.

Specifically, types are getting dropped from kvm_update_va_mask and friends because they're only referenced in an object file that's rewritten to have a __kvm_nvhe_ prefix on all symbols (using objcopy --prefix-symbols). To make this work, they append __kvm_nvhe_kvm_update_va_mask = kvm_update_va_mask; to vmlinux.lds, but since that's only used when linking vmlinux, LLD doesn't see any references to kvm_update_va_mask when we link vmlinux.o. Ugh.

I guess for any of this to work, we'd need to have a linker script with all the KVM aliases already when linking vmlinux.o.

    r.VisibleToRegularObj = sym->isUsedInRegularObj ||
                            sym->referencedAfterWrap ||
                            (r.Prevailing && (config->relocatable ? (sym->computeBinding() != STB_LOCAL) : sym->includeInDynsym() )) ||
                            usedStartStop.count(objSym.getSectionName());

That would be after changing lld so that computeBinding() is influenced by version scripts in relocatable links.

That sounds reasonable to me.

@MaskRay what do you think about adding an option to provide a list of exported symbols for a relocatable link, so we can better optimize vmlinux.o with LTO?

nickdesaulniers commented 1 year ago

but I'm still seeing types getting dropped in the arm64 KVM code because of the funky binary editing they're doing.

We could also disable LTO for wacky code (in the kernel makefiles).

lvwr commented 1 year ago

FTR; I just submitted https://reviews.llvm.org/D140035 to fix the problem with IBT.

samitolvanen commented 1 year ago

What might be useful would be if it were possible to filter the symbols exported from the object file when the relocatable link occurs. This would also have benefits for regular LTO because the compiler would be able to optimize more aggressively

I hacked together a prototype for this (llvm, kernel), and since we only have a list of exported functions and LTO would happily drop also data symbols, I limited it to executable sections.

With just x86_64 defconfig + LTO_CLANG_FULL, the resulting vmlinux.o has ~49% fewer global functions and the stripped vmlinux.o binary is ~5.8% smaller. However, in order to make this kernel bootable, I have to explicitly add __attribute__((no_stack_protector)) to start_kernel because the compiler now thinks this function needs stack checks, probably because something now gets inlined there. Otherwise the kernel seems to work fine.

Here are the updated numbers with CFI enabled: __cfi_ symbols vmlinux.o size Configuration
46452 30508 kB x86_64 defconfig
46900 47024 kB + LTO_CLANG_FULL
39484 44292 kB + --lto-export-symbol-list
34699 44216 kB + kcfi-seal and VisibleToRegularObj not dropped

Note that kcfi-seal here drops types from non-address-taken globals that are not visible to regular objects. All the additional types it drops are for functions with default visibility, so @MaskRay's suggestion to only drop types for functions with hidden visibility wouldn't improve the situation. Note that this is for a kernel compiled with -fvisibility=hidden.

lvwr commented 1 year ago

While thinking about the -mibt-seal removal patch, it occurred to me that perhaps -kcfi-seal may impact the stack depth tracking mitigation that, IIRC, uses the padding areas. In case you haven't yet, you might want to double-check with PeterZ if we really want to optimize the binary size or if we want to suppress the hash without preventing the emission of the extra bytes on the prologue.

samitolvanen commented 1 year ago

While thinking about the -mibt-seal removal patch, it occurred to me that perhaps -kcfi-seal may impact the stack depth tracking mitigation that, IIRC, uses the padding areas.

The last time I checked they used -fpatchable-function-entry to ensure there's always space before the function, so this use case shouldn't be affected, but I'll double check. Thanks for reminding me!

samitolvanen commented 1 year ago

The initial --lto-export-symbol-list patch: https://reviews.llvm.org/D142163

Once we have an acceptable solution for limiting the number of exported globals in relocatable LTO links, we can proceed with the *-seal patches.

samitolvanen commented 1 year ago

Updated https://reviews.llvm.org/D138337 to not drop type hashes for VisibleToRegularObj symbols.