Open samitolvanen opened 2 years ago
That's nice; it sounds like LTO will no longer be strictly necessary for kCFI, but will still improve kCFI when used.
Here's a quick draft: https://github.com/samitolvanen/llvm-project/commit/0d3cc88cc69ea859773f95a2b2e8ee1adfd4b6c8
With x86_64 defconfig + LTO_CLANG_FULL
+ CFI_CLANG
, the number of indirectly callable functions dropped from 46647 to 34712 (-11935 = ~-25.6%) and the kernel booted just fine in my tests.
However, with LTO_CLANG_THIN
, the kcfi-seal
attribute resulted in the type hash getting dropped from another 1000+ functions, and I ended up with a kernel that failed to boot. The main issue appears to be that Function::hasAddressTaken()
returns false for some address-taken globals (e.g. e820__memory_setup_default
, which leads to an early boot crash).
@lvwr Do you remember running into similar issues with ThinLTO when you implemented ibt-seal? I'm not sure if ThinLTO even keeps track of address-taken functions across TUs.
I confirmed that ThinLTO doesn't know about cross-TU address taking, which means -mibt-seal
also doesn't work correctly. Here's a reproducer:
$ cat thinlto1.c
extern void g(void);
void *f(void) { return &g; }
$ cat thinlto2.c
void g(void) {}
Which I compiled with -flto=thin
and IBT using the kernel
code model, and then ensured Clang set the ibt-seal
flag:
$ clang -c -flto=thin -fcf-protection=branch -mibt-seal -mcmodel=kernel thinlto?.c
$ llvm-dis -o - thinlto1.o | grep ibt-seal
!2 = !{i32 8, !"ibt-seal", i32 1}
$ llvm-dis -o - thinlto2.o | grep ibt-seal
!2 = !{i32 8, !"ibt-seal", i32 1}
When linked into a binary, we have no endbr64
instructions:
$ ld.lld -r -o thinlto.o thinlto?.o
$ objdump -d -r thinlto.o
thinlto.o: file format elf64-x86-64
Disassembly of section .text.f:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # b <f+0xb>
7: R_X86_64_PC32 g-0x4
b: 5d pop %rbp
c: c3 ret
Disassembly of section .text.g:
0000000000000000 <g>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 5d pop %rbp
5: c3 ret
Since the address of g
was taken in the other TU, the function should have an endbr64
. However, when I compile the reproducer with -flto
instead of -flto=thin
, ibt-seal
works correctly again:
$ objdump -d -r fulllto.o
fulllto.o: file format elf64-x86-64
Disassembly of section .text.f:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # b <f+0xb>
7: R_X86_64_PC32 g-0x4
b: 5d pop %rbp
c: c3 ret
Disassembly of section .text.g:
0000000000000000 <g>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 5d pop
So, it looks like the *-seal
optimizations are only going to work with full LTO.
ThinLTO doesn't know about cross-TU address taking
I asked Teresa Johnson about that, her thoughts:
ThinLTO might need to be taught something to propagate the necessary info. There should be a ref edge from f -> g in the index If this optimization is only IR scoped it might need to have a ThinLTO version to propagate the cross-TU ref info to the referenced func symbol for the ThinLTO backend to manage. But yes, if that is an optimization that relies on seeing references to functions only on IR, it sounds like an accurate statement that ThinLTO will not work automatically The ref info is there in the index (it should be), so just need a flag or something to propagate the info back and make the optimization look at the summary too, or something to that effect.
Yes, that makes perfect sense, thanks for checking with Teresa. I'll disable kcfi-seal with ThinLTO initially and we can later see how much work it would be to make this work with ThinLTO as well.
Testing this a bit more, it looks like this optimization doesn't play nicely with arm64's alternative_cb
where the callback functions are defined in stand-alone assembly, but the indirect calls to them in __apply_alternatives
have a KCFI check. I'll have to think of a reasonable solution here. At least I didn't run into any issues on x86_64... yet!
Adding __used
to the affected functions fixes the arm64 issue, but I decided it's probably best to add the compiled feature behind a codegen option nevertheless so it can be disabled for specific translation units if needed -- just like ibt-seal
:
https://github.com/samitolvanen/llvm-project/commits/kcfi-seal
I'll try to clean this up this week, do some more testing, and upload a patch to Phabricator for review.
Based on feedback from @pcc, I hacked together a version of the patch that doesn't drop type information for functions with VisibleToRegularObj
set in symbol resolution. While this definitely does solve the issue with functions that are only address-taken in assembly but called indirectly in C, it also results in almost no types getting dropped from vmlinux.
Here are updated numbers with ToT Linux + the latest Clang: | __cfi_ symbols |
Configuration |
---|---|---|
46642 | x86_64 defconfig without kcfi-seal |
|
46610 | With kcfi-seal , but VisibleToRegularObj not dropped |
|
34513 | With kcfi-seal |
It seems to me like the only feasible path forward is to keep this behind a flag so we can disable it if needed, or sprinkle the relevant kernel functions with __used
to make sure we don't drop their type information with LTO. Thoughts?
Do you understand why VisibleToRegularObj
is being set on most of the symbols?
Do you understand why
VisibleToRegularObj
is being set on most of the symbols?
Looking at the code in LLD, it's because the kernel is doing a relocatable link. What do you think about also adding isUsedInRegularObj
to SymbolResolution
. I didn't test that yet, but it looks like that would be a more useful signal here.
The problem with relocatable links is that basically every symbol must be pessimistically considered to be used, because the linker doesn't know which symbols will be used in the final link. If the final link takes the address of one of those symbols, that would cause a problem. So I don't think that exposing isUsedInRegularObj
would help here.
What might be useful would be if it were possible to filter the symbols exported from the object file when the relocatable link occurs. This would also have benefits for regular LTO because the compiler would be able to optimize more aggressively (e.g. if A calls B and B has no other calls, it would almost always be beneficial to inline B into A because it would not be necessary to emit another copy of B). This could use the existing version script syntax to specify the list of symbols.
So I don't think that exposing
isUsedInRegularObj
would help here.
You're correct, that didn't look useful after all.
What might be useful would be if it were possible to filter the symbols exported from the object file when the relocatable link occurs. This would also have benefits for regular LTO because the compiler would be able to optimize more aggressively (e.g. if A calls B and B has no other calls, it would almost always be beneficial to inline B into A because it would not be necessary to emit another copy of B). This could use the existing version script syntax to specify the list of symbols.
Sure, I can see how that could benefit LTO, but this requires coming up with a list of symbols. Using the module exports wouldn't be sufficient as none of the functions used with alternative_cb
are exported, for example. Perhaps we'd have to additionally go through all the non-bitcode files included in the relocatable link and check for undefined symbols? Or did you have some clever idea in mind for this?
Perhaps we'd have to additionally go through all the non-bitcode files included in the relocatable link and check for undefined symbols?
That's what I had in mind, although I don't think this would be an additional step because isUsedInRegularObj
is already being set if we see an undefined symbol in a non-bitcode file in order to implement the same thing for non-relocatable links. The condition for setting VisibleToRegularObj
in lld would become something like
r.VisibleToRegularObj = sym->isUsedInRegularObj ||
sym->referencedAfterWrap ||
(r.Prevailing && (config->relocatable ? (sym->computeBinding() != STB_LOCAL) : sym->includeInDynsym() )) ||
usedStartStop.count(objSym.getSectionName());
That would be after changing lld so that computeBinding()
is influenced by version scripts in relocatable links.
That's what I had in mind, although I don't think this would be an additional step because
isUsedInRegularObj
is already being set if we see an undefined symbol in a non-bitcode file in order to implement the same thing for non-relocatable links.
Hmm, so exposing isUsedInRegularObj
to LTO and not dropping types for symbols where it's set should also fix the alternative_cb
issue in that case. I did some testing and it looks like it fixes most of the problem, but I'm still seeing types getting dropped in the arm64 KVM code because of the funky binary editing they're doing.
Specifically, types are getting dropped from kvm_update_va_mask
and friends because they're only referenced in an object file that's rewritten to have a __kvm_nvhe_
prefix on all symbols (using objcopy --prefix-symbols
). To make this work, they append __kvm_nvhe_kvm_update_va_mask = kvm_update_va_mask;
to vmlinux.lds
, but since that's only used when linking vmlinux
, LLD doesn't see any references to kvm_update_va_mask
when we link vmlinux.o
. Ugh.
I guess for any of this to work, we'd need to have a linker script with all the KVM aliases already when linking vmlinux.o
.
r.VisibleToRegularObj = sym->isUsedInRegularObj || sym->referencedAfterWrap || (r.Prevailing && (config->relocatable ? (sym->computeBinding() != STB_LOCAL) : sym->includeInDynsym() )) || usedStartStop.count(objSym.getSectionName());
That would be after changing lld so that
computeBinding()
is influenced by version scripts in relocatable links.
That sounds reasonable to me.
@MaskRay what do you think about adding an option to provide a list of exported symbols for a relocatable link, so we can better optimize vmlinux.o
with LTO?
but I'm still seeing types getting dropped in the arm64 KVM code because of the funky binary editing they're doing.
We could also disable LTO for wacky code (in the kernel makefiles).
FTR; I just submitted https://reviews.llvm.org/D140035 to fix the problem with IBT.
What might be useful would be if it were possible to filter the symbols exported from the object file when the relocatable link occurs. This would also have benefits for regular LTO because the compiler would be able to optimize more aggressively
I hacked together a prototype for this (llvm, kernel), and since we only have a list of exported functions and LTO would happily drop also data symbols, I limited it to executable sections.
With just x86_64 defconfig + LTO_CLANG_FULL
, the resulting vmlinux.o has ~49% fewer global functions and the stripped vmlinux.o binary is ~5.8% smaller. However, in order to make this kernel bootable, I have to explicitly add __attribute__((no_stack_protector))
to start_kernel
because the compiler now thinks this function needs stack checks, probably because something now gets inlined there. Otherwise the kernel seems to work fine.
Here are the updated numbers with CFI enabled: | __cfi_ symbols |
vmlinux.o size | Configuration |
---|---|---|---|
46452 | 30508 kB | x86_64 defconfig | |
46900 | 47024 kB | + LTO_CLANG_FULL |
|
39484 | 44292 kB | + --lto-export-symbol-list |
|
34699 | 44216 kB | + kcfi-seal and VisibleToRegularObj not dropped |
Note that kcfi-seal
here drops types from non-address-taken globals that are not visible to regular objects. All the additional types it drops are for functions with default visibility, so @MaskRay's suggestion to only drop types for functions with hidden visibility wouldn't improve the situation. Note that this is for a kernel compiled with -fvisibility=hidden
.
While thinking about the -mibt-seal removal patch, it occurred to me that perhaps -kcfi-seal may impact the stack depth tracking mitigation that, IIRC, uses the padding areas. In case you haven't yet, you might want to double-check with PeterZ if we really want to optimize the binary size or if we want to suppress the hash without preventing the emission of the extra bytes on the prologue.
While thinking about the -mibt-seal removal patch, it occurred to me that perhaps -kcfi-seal may impact the stack depth tracking mitigation that, IIRC, uses the padding areas.
The last time I checked they used -fpatchable-function-entry
to ensure there's always space before the function, so this use case shouldn't be affected, but I'll double check. Thanks for reminding me!
The initial --lto-export-symbol-list
patch: https://reviews.llvm.org/D142163
Once we have an acceptable solution for limiting the number of exported globals in relocatable LTO links, we can proceed with the *-seal
patches.
Updated https://reviews.llvm.org/D138337 to not drop type hashes for VisibleToRegularObj
symbols.
Another feature suggestion from @lvwr:
This would work similarly to ibt-seal. Basically, with LTO we can tell for sure that non-address-taken globals are not indirectly called, which means we can drop the
!kcfi_type
metadata from them, thus making these functions invalid indirect call targets.cc @kees