Open michaelopdenacker opened 2 years ago
Hi Michael! As far as I am aware, the size increase comes from the linker’s ability to inline more aggressively, which is why you see a larger increase with full LTO over ThinLTO, as full LTO is basically like cat
ing all object files together.
Consider enabling CONFIG_LD_DEAD_CODE_ELIMINATION on top of LTO, though the resulting image might not necessarily boot (LD_DEAD_CODE_ELIMINATION has been hit or miss IMO).
Thanks for the tip! On arm64, I tried to enable CONFIG_LD_DEAD_CODE_ELIMINATION (after adding "select HAVE_LD_DEAD_CODE_DATA_ELIMINATION" to arch/arm64/Kconfig) together with CONFIG_LTO_CLANG_FULL, and here's the size I get for Image.gz: 15301033 (vs 16097599 for LTO_CLANG_FULL alone) That's -5%, but that's still bigger than with LTO_NONE.
Hi Michael! As far as I am aware, the size increase comes from the linker’s ability to inline more aggressively, which is why you see a larger increase with full LTO over ThinLTO, as full LTO is basically like
cat
ing all object files together.
Hi Nathan. It makes sense, thanks!
That's -5%, but that's still bigger than with LTO_NONE.
Some other thoughts:
LTO enables -ffunction-sections
and -fdata-sections
(so does CONFIG_LD_DEAD_CODE_ELIMINATION
IIRC). This does waste space IIRC due to ELF section alignment requirements.
Also, even if many kernel interfaces are actually unused ever during runtime, if the interface is exported, the linker can't know that a module will never need such a symbol at runtime. If symbols are rooted (referenced) by other symbols that are exported, the code does not appear dead to the linker. (Though I wonder about --gc-keep-exported
since we DONT set that).
There's probably an -Rpass-missed=
flag that can be passed to aid in debugging. IDK if the linker has a corresponding flag to debug, probably --print-gc-sections
?
Also
Makefile
900:KBUILD_LDFLAGS += -mllvm -import-instr-limit=5
that 5 is a bit of heuristic that can be played with to affect how frequently inlining is performed. From
llvm/lib/Transforms/IPO/FunctionImport.cpp
76 /// Limit on instruction count of imported functions.
77 static cl::opt<unsigned> ImportInstrLimit(
78 "import-instr-limit", cl::init(100), cl::Hidden, cl::value_desc("N"),
79 cl::desc("Only import functions with less than N instructions"));
note these are LLVM IR instructions, not machine instructions.
Also, even if many kernel interfaces are actually unused ever during runtime, if the interface is exported, the linker can't know that a module will never need such a symbol at runtime.
Ah, that's what CONFIG_TRIM_UNUSED_KSYMS is for.
BTW @michaelopdenacker nice talk! I've added a link to it from our wiki.
Greetings
I expected the LTO optimized kernels to be smaller because of the elimination of dead code, as described on https://www.llvm.org/docs/LinkTimeOptimization.html
However, the LTO optimized compressed kernels are actually bigger.
On arm64, Linux 5.18-rc7, defconfig configuration, the Image.gz file size is: using gcc 12: 12149327 bytes using clang 14 with CONFIG_LTO_NONE=y: 12197028 bytes using clang 14 without CONFIG_LTO_CLANG_THIN: 12419217 bytes using clang 14 with CONFIG_LTO_CLANG_FULL: 16097599 bytes
On x86_64, Linux 5.18-rc7, x86_64_defconfig configuration file, the bzImage file size is: using gcc 12: 10743648 bytes using clang 14 with CONFIG_LTO_NONE=y: 11011008 bytes using clang without CONFIG_LTO_CLANG_THIN: 11166720 bytes using clang 14 with CONFIG_LTO_CLANG_FULL:13650880 bytes
How would you explain that the LTO_THIN and above all the LTO_FULL optimized kernels are actually bigger than the LTO_NONE ones, at least on the x86_64 and arm64 architectures?
Thanks in advance for your insights. Cheers Michael.