Open jacky8hyf opened 4 months ago
Sounds like #1618?
I wonder if it would be better to drop the ThinLTO caching flags instead? If it impacts correctness, it seems like it is worth avoiding. Thoughts @samitolvanen? Did caching make a huge difference for incremental compiles?
I'm not aware of anyone depending on ThinLTO caching for incremental builds. Android quickly switched back to FullLTO and most incremental builds during development happen with LTO disabled, so I don't have any performance numbers either. I'm fine with dropping the caching.
I think I would prefer to go that route. I'll do some benchmarking to see what kind of performance hit we will be taking by removing it. This should obviously still be fixed in LLVM but the rate at which it is fixed will no longer impact the kernel and if people want the cache brought back, it can be made conditional on having a fixed version of LLVM.
I have sent this patch upstream at https://lore.kernel.org/lkml/20240429220756.979347-2-elsk@google.com/T/#u.
For Android, we have disabled LTO by default since 6.1, so this is not an issue for us now. I will do some mitigation on 5.15 by backporting https://lore.kernel.org/lkml/20240429220756.979347-2-elsk@google.com/T/#u .
@samitolvanen and @nathanchance , do you mean that the --thinlto-cache-dir option should be dropped from the build command? https://github.com/torvalds/linux/blob/d03d4188908883e1705987795a09aeed31424f66/Makefile#L945
Yes, I think Nathan proposed dropping --thinlto-cache-dir
. Just to clarify, do you see this problem also with LTO_CLANG_FULL
or only with ThinLTO? Android uses LTO_CLANG_FULL
on 5.15, but this flag is only for ThinLTO.
@samitolvanen and @nathanchance , do you mean that the --thinlto-cache-dir option should be dropped from the build command? https://github.com/torvalds/linux/blob/d03d4188908883e1705987795a09aeed31424f66/Makefile#L945
Yes, I think that line should just be deleted to resolve this issue. If people notice the removal and want it brought back, we can always revert it and add the workarounds like the change you sent but if the impact to compile time for incremental builds is not noticeable with or without the cache, I would rather just avoid this problem and any other ones that may surface as a result of caching.
I did some benchmarking on my test machine that has an i7-11700 (8c / 16t) in it and came to the conclusion that removing the ThinLTO cache would result in about a 20% performance hit (measured with building a full kernel first then just touching init/main.c
).
ARCH=x86_64 defconfig
+ CONFIG_LTO_CLANG_THIN=y
:
Benchmark 1: With ThinLTO cache
Time (mean ± σ): 85.772 s ± 0.252 s [User: 91.505 s, System: 8.408 s]
Range (min … max): 85.447 s … 86.244 s 10 runs
Benchmark 2: Without ThinLTO cache
Time (mean ± σ): 103.833 s ± 0.288 s [User: 232.058 s, System: 8.569 s]
Range (min … max): 103.286 s … 104.124 s 10 runs
Summary
With ThinLTO cache ran
1.21 ± 0.00 times faster than Without ThinLTO cache
ARCH=x86_64 allmodconfig
+ CONFIG_LTO_CLANG_THIN=y
:
Benchmark 1: With ThinLTO cache
Time (mean ± σ): 119.537 s ± 0.243 s [User: 215.822 s, System: 34.958 s]
Range (min … max): 118.970 s … 119.780 s 10 runs
Benchmark 2: Without ThinLTO cache
Time (mean ± σ): 145.668 s ± 0.658 s [User: 429.416 s, System: 35.013 s]
Range (min … max): 144.787 s … 146.666 s 10 runs
Summary
With ThinLTO cache ran
1.22 ± 0.01 times faster than Without ThinLTO cache
I suspect the difference may be less noticeable on more powerful machines. I think this is an acceptable tradeoff for correctness, especially since I would not expect a developer to require ThinLTO for incremental development.
I've submitted my suggested workaround: https://lore.kernel.org/20240501-kbuild-llvm-drop-thinlto-cache-v1-1-c117cc50a24b@kernel.org/
Workaround patch has been accepted: https://git.kernel.org/masahiroy/linux-kbuild/c/aba091547ef6159d52471f42a3ef531b7b660ed8
Patch now in mainline: https://git.kernel.org/linus/aba091547ef6159d52471f42a3ef531b7b660ed8
Build-time here on Linux v6.9.7 (includes patch) does NOT matter:
$ cat 6.9.7-*/build-time.txt
Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-2-amd64-clang18-kcfi KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2024-06-30 bindeb-pkg KDEB_PKGVERSION=6.9.7-2~trixie+dileks1':
63494407.99 msec task-clock # 3.525 CPUs utilized
9062831 context-switches # 142.734 /sec
411788 cpu-migrations # 6.485 /sec
416054737 page-faults # 6.553 K/sec
110382034030285 cycles # 1.738 GHz
80146145626267 stalled-cycles-frontend # 72.61% frontend cycles idle
57218874926432 stalled-cycles-backend # 51.84% backend cycles idle
76939532714144 instructions # 0.70 insn per cycle
# 1.04 stalled cycles per insn
13762387030701 branches # 216.750 M/sec
720987778189 branch-misses # 5.24% of all branches
18013.732280936 seconds time elapsed
59012.849516000 seconds user
4461.140162000 seconds sys
Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-3-amd64-clang18-kcfi KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2024-06-30 bindeb-pkg KDEB_PKGVERSION=6.9.7-3~trixie+dileks1':
63243448.47 msec task-clock # 3.502 CPUs utilized
10036370 context-switches # 158.694 /sec
425935 cpu-migrations # 6.735 /sec
416947947 page-faults # 6.593 K/sec
110722495867292 cycles # 1.751 GHz
80475696588834 stalled-cycles-frontend # 72.68% frontend cycles idle
57515842771175 stalled-cycles-backend # 51.95% backend cycles idle
76957508623789 instructions # 0.70 insn per cycle
# 1.05 stalled cycles per insn
13766924281493 branches # 217.681 M/sec
722703988007 branch-misses # 5.25% of all branches
18058.162038615 seconds time elapsed
58780.742314000 seconds user
4429.886564000 seconds sys
But disc-usage looks better - reduced by 5530M with patch:
$ cat 6.9.7-*/disc-usage.txt
31237 /home/dileks/src/linux/git
1432 /home/dileks/src/linux/archives/6.9.7-2-amd64-clang18-kcfi
25707 /home/dileks/src/linux/git
1432 /home/dileks/src/linux/archives/6.9.7-3-amd64-clang18-kcfi
Bug description
To reproduce the error:
clang-r522817
.Then do the following:
In menuconfig, enable the following:
LTO_CLANG_THIN=y
: Enables LTOIKCONFIG=y
: Enables /proc/config.gzHEADERS_INSTALL=y
: Any config will do. I use this one as an example.Then:
The last two commands will both say
CONFIG_HEADERS_INSTALL=y
.Now, run
PATH=$PATH:/path/to/clang-r522817/bin/ make ARCH=arm64 LLVM=1 menuconfig
again, and disableHEADERS_INSTALL
, then build again:The first command will emit
# CONFIG_HEADERS_INSTALL is not set
correctly, but the last command will still emitCONFIG_HEADERS_INSTALL=y
.Analysis
The
config_data.gz
file is embedded in vmlinux using a.incbin
instruction: https://github.com/torvalds/linux/blob/96fca68c4fbf77a8185eb10f7557e23352732ea2/kernel/configs.c#L28From the build log and from the output of the above commands, Kbuild correctly updates kernel/config_data.gz, but the change is not reflected in vmlinux, even though it is rebuilt.
LTO (specifically caching) may play a part in this.
make menuconfig
commands, vmlinux will correctly have# CONFIG_HEADERS_INSTALL is not set
..thinlto-cache
directory (--thinlto-cache-dir
^1) before rebuilding, vmlinux will also correctly have# CONFIG_HEADERS_INSTALL is not set
.configs.o
with-fno-lto
, and vmlinux will also correctly have# CONFIG_HEADERS_INSTALL is not set
.Hence, I believe this is a bug in LTO caching. Please help, thanks!
I am going to upload this workaround patch to Linux upstream, but the root issue should be fixed.