Open nickdesaulniers opened 5 years ago
@pcc might be interested in it.
-fno-semantic-interposition
might be of interest.
Sorry, @ardbiesheuvel , all of my links for this look irrelevant. Do you have a lore link or something written up for your idea for a kernel code model for aarch64? I think some of the aggressive optimizations @MaskRay has been working on for x86 might play in with some of your ideas for aarch64.
I don't have any links at hand, but I can provide some background.
This issue came up when I discussed the assumption in the Linux/arm64 build system that AArch64 code generated by GCC without the -fpic or -fpie flags set is suitable for linking with -pie, so that we can emit dynamic relocations into the bare metal binary, which it can use to self relocate at boot, for KASLR.
Ramana (who is [still] at ARM but no longer works on GCC so I won't pull him into this discussion) pointed out that this is risky, and it would be better to generate -fpic code. However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary. This means that emitting GOT indirections is pointless, but inhibiting that with -fpic is cumbersome: -fvisibility=hidden only affects definitions not declarations, and the visibility pragma (which does affect declarations too) can only be emitted via a .h file, which needs to be pulled in using -include etc etc
So this is when we first discussed introducing -mcmodel=kernel for AArch64, which could imply whichever internal options we need to get small model code but without all the GOT and .so stuff.
For a STB_GLOBAL/STB_WEAK symbol,
STV_DEFAULT: both compiler & linker need to assume such symbols can be preempted in -fpic mode. The compiler emits GOT indirection by default. GCC -fno-semantic-interposition uses local aliases on defined non-weak function symbols for x86 (unimplemented in other architectures). Clang -fno-semantic-interposition uses local aliases on defined non-weak symbols (both function and data) for x86.
STV_PROTECTED: GCC -fpic uses GOT indirection for data symbols, regardless of defined or undefined. This pessimization is to make a misfeature "copy relocation on protected data symbol" work (https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#protected-data-symbols-and-direct-accesses). Clang code generation treats STV_PROTECTED the same way as STV_HIDDEN.
STV_HIDDEN: non-preemptible, regardless of defined or undefined. The compiler suppresses GOT indirection, unless undefined STB_WEAK.
For defined symbols, -fno-pic/-fpie can avoid GOT indirection for STV_DEFAULT (and GCC STV_PROTECTED). -fvisibility=hidden can change visibility.
For undefined symbols, -fpie/-fpic use GOT indirection by default. Clang -fno-direct-access-external-data (discussed in my article) can avoid GOT indirection. If you -fpic -fno-direct-access-external-data & ld -shared, you'll need additional linker options to make the linker know defined non-STB_LOCAL STV_DEFAULT symbols are non-preemptible.
However, PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code: in a bare metal binary, there is no ELF symbol preemption, text relocations are not a problem, and executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.
The use case is similar to a userspace static no-pie executable (-fno-pic -no-pie) or static pie (-fpie -pie).
and it would be better to generate -fpic code.
Why is -fpie risky?
Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):
PIC code generation is heavily geared towards shared objects in hosted executables, resulting in suboptimal code
It seems to also produce an excessive growth in the number of relocations in debug info sections, which accounts for a significant growth in size of the binary (when debug info is not stripped or produced separately). The change in file size of vmlinux from enabling CONFIGRELOCATABLE can be ~95% attributed to growth in .rela.debug* sections, at least on x86 and DWARFv4.
there is no ELF symbol preemption
Does -fno-semantic-interposition
help, or is there still more? How does -fvisibility
differ from -fno-semantic-interposition
(I should probably just go look up STV_PROTECTED
)? How does -fpie
differ from -fpic
?
text relocations are not a problem
Right, hence -Wl,-z,notext
, or are there additional problems?
executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.
I understand; does this result in sub optimal code gen, in your experience?
Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).
Do Clang and GCC both not implement -fno-semantic-interposition
for non-x86 architectures?
I believe this is the link in the first comment:
Here are lore links for all of the other posts:
https://lore.kernel.org/r/CAKv+Gu_tuYcikQ07QKP-N+rd+DpoucSYn6TG+OJ-jm9CVGaDxg@mail.gmail.com https://lore.kernel.org/r/26a25069-ea1d-5fb3-549c-ab653f454a30@arm.com https://lore.kernel.org/r/20171115213428.22559-7-samitolvanen@google.com https://lore.kernel.org/r/20171103192634.u25go4tu7lgzl6ja@lakrids.cambridge.arm.com/
Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):
...
there is no ELF symbol preemption
Does
-fno-semantic-interposition
help, or is there still more? How does -fvisibility
differ from-fno-semantic-interposition
(I should probably just go look upSTV_PROTECTED
)? How does-fpie
differ from-fpic
?
I don't see a difference with -fno-semantic-interposition, either on GCC or Clang. In both cases, a reference to an undefined symbol is emitted using an entry in the GOT.
text relocations are not a problem
Right, hence
-Wl,-z,notext
, or are there additional problems?
Not to my knowledge, no.
executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.
I understand; does this result in sub optimal code gen, in your experience?
Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.
Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).
The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker.
Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)
Do Clang and GCC both not implement
-fno-semantic-interposition
for non-x86 architectures?
Both accept it for Aarch64 targets but I don't see any difference in the generated code.
Thanks for more information. Questions (naive, perhaps, but I appreciate the feedback):
...
there is no ELF symbol preemption
Does
-fno-semantic-interposition
help, or is there still more? How does -fvisibility
differ from-fno-semantic-interposition
(I should probably just go look upSTV_PROTECTED
)? How does-fpie
differ from-fpic
?
My previous comment mentioned the semantics.
text relocations are not a problem
Right, hence
-Wl,-z,notext
, or are there additional problems?Not to my knowledge, no.
executable code is never shared, so reducing the memory footprint of COW'ed sections is unnecessary.
I understand; does this result in sub optimal code gen, in your experience?
Yes, through the generation of GOT entries. With a GOT, all relocated quantities are close together, which reduces the footprint of pages that are CoW'ed due to relocation processing. Without CoW, this GOT just takes up more space and results in more memory accesses, but without the benefit.
Also, I'm curious if such a code model would no longer support CONFIG_RELOCATABLE=n? As in, only PIC-like relative references? Or would there still be a use case for non-PIC like code? Perhaps folks don't want KASLR support (though that's what the command line option is for, I suppose).
The point is really that AArch64's ADRP/ADD pairs are position independent by their very nature, which is why we currently don't need to use -fpic or-fpie to obtain object files that can be linked with -pie. In other words, the object code is identical, and the only difference is in the additional RELA sections and metadata emitted by the linker.
Fundamentally, this code model should equally support CONFIG_RELOCATABLE=n because that code model should codify the current behavior of -mcmodel=small, but with future guarantees that the resulting object files can always be linked using using -pie, and that absolute references are only emitted when strictly needed (i.e., not for jump tables)
clang -fpie -fdirect-access-external-data
meets your needs. GCC feature request: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
-fpie
means defined symbols are non-preemptible. -fdirect-access-external-data
references undefined symbols via direct relocation types.
Do Clang and GCC both not implement
-fno-semantic-interposition
for non-x86 architectures?
No, as my previous comment mentioned.
clang -fpie -fdirect-access-external-data
meets your needs.
Is -fdirect-access-external-data
currently only implemented for x86 in clang, like -fno-semantic-interposition
?
and that absolute references are only emitted when strictly needed (i.e., not for jump tables)
Right, if a compiler uses absolute references for jump tables when compiling as -pie
, then that's a compiler bug. Right?
clang -fpie -fdirect-access-external-data
meets your needs.Is
-fdirect-access-external-data
currently only implemented for x86 in clang, like-fno-semantic-interposition
?
-fdirect-access-external-data
is supported by most llvm supported targets.
The opposite -fno-pic fno-direct-access-external-data
has triggered an x86 fastisel bug and an arm fastisel bug.
and that absolute references are only emitted when strictly needed (i.e., not for jump tables)
Right, if a compiler uses absolute references for jump tables when compiling as
-pie
, then that's a compiler bug. Right?
(Fixing a typo: -pie
=> -fpie
. -pie
is a linker mode.)
Yes. Absolute references should only produced for -fno-pic code. (Technically if a symbol is SHN_ABS, absolute references can be used in -fpie/-fpic mode as well. LLVM IR !absolute_symbol
(but clang does not emit it))
$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 LLVM_IAS=1 -j72 KCFLAGS=-fdirect-access-external-data
built, booted (in QEMU), and no one died (this time)(I think).
Checking the object files' relocations, which undefined symbols use relocations that reference the GOT? This is on the caller's side that bl
should be using non R_AARCH64_CALL26
relocations, without -fno-direct-access-external-data
? Or is there something I need to change in Kbuild first? I do have CONFIG_RELOCATABLE=y set.
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/613617.htmlUpdated links: https://github.com/ClangBuiltLinux/linux/issues/275#issuecomment-770430621