ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

With LTO and LTO cache, .incbin in assembly is not handled properly during incremental builds #2021

Open jacky8hyf opened 4 months ago

jacky8hyf commented 4 months ago

Bug description

To reproduce the error:

Then do the following:

PATH=$PATH:/path/to/clang-r522817/bin/ make ARCH=arm64 LLVM=1 tinyconfig
PATH=$PATH:/path/to/clang-r522817/bin/ make ARCH=arm64 LLVM=1 menuconfig

In menuconfig, enable the following:

Then:

PATH=$PATH:/path/to/clang-r522817/bin/ make ARCH=arm64 LLVM=1 -j64
zcat kernel/config_data.gz | grep HEADER
scripts/extract-ikconfig vmlinux | grep HEADER

The last two commands will both say CONFIG_HEADERS_INSTALL=y.

Now, run PATH=$PATH:/path/to/clang-r522817/bin/ make ARCH=arm64 LLVM=1 menuconfig again, and disable HEADERS_INSTALL, then build again:

PATH=$PATH:/path/to/clang-r522817/bin/ make ARCH=arm64 LLVM=1 -j64
zcat kernel/config_data.gz | grep HEADER
scripts/extract-ikconfig vmlinux | grep HEADER

The first command will emit # CONFIG_HEADERS_INSTALL is not set correctly, but the last command will still emit CONFIG_HEADERS_INSTALL=y.

Analysis

The config_data.gz file is embedded in vmlinux using a .incbin instruction: https://github.com/torvalds/linux/blob/96fca68c4fbf77a8185eb10f7557e23352732ea2/kernel/configs.c#L28

From the build log and from the output of the above commands, Kbuild correctly updates kernel/config_data.gz, but the change is not reflected in vmlinux, even though it is rebuilt.

LTO (specifically caching) may play a part in this.

From 10eacc104cb39c26a753796c981e46d0cba233c2 Mon Sep 17 00:00:00 2001
From: Yifan Hong <elsk@google.com>
Date: Mon, 8 Apr 2024 23:49:38 -0700
Subject: [PATCH] config: Disable LTO on kernel/configs.o

Signed-off-by: Yifan Hong <elsk@google.com>
---
 kernel/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 2e0dac3da346..0db1e9357668 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -140,6 +140,10 @@ obj-$(CONFIG_SCF_TORTURE_TEST) += scftorture.o

 $(obj)/configs.o: $(obj)/config_data.gz

+# Disable LTO on configs.o so changes in .config is reflected
+# in vmlinux.
+CFLAGS_REMOVE_configs.o += $(CC_FLAGS_LTO)
+
 targets += config_data config_data.gz
 $(obj)/config_data.gz: $(obj)/config_data FORCE
    $(call if_changed,gzip)
--
2.44.0.769.g3c40516874-goog

Hence, I believe this is a bug in LTO caching. Please help, thanks!

I am going to upload this workaround patch to Linux upstream, but the root issue should be fixed.

nathanchance commented 4 months ago

Sounds like #1618?

I wonder if it would be better to drop the ThinLTO caching flags instead? If it impacts correctness, it seems like it is worth avoiding. Thoughts @samitolvanen? Did caching make a huge difference for incremental compiles?

samitolvanen commented 4 months ago

I'm not aware of anyone depending on ThinLTO caching for incremental builds. Android quickly switched back to FullLTO and most incremental builds during development happen with LTO disabled, so I don't have any performance numbers either. I'm fine with dropping the caching.

nathanchance commented 4 months ago

I think I would prefer to go that route. I'll do some benchmarking to see what kind of performance hit we will be taking by removing it. This should obviously still be fixed in LLVM but the rate at which it is fixed will no longer impact the kernel and if people want the cache brought back, it can be made conditional on having a fixed version of LLVM.

jacky8hyf commented 4 months ago

I have sent this patch upstream at https://lore.kernel.org/lkml/20240429220756.979347-2-elsk@google.com/T/#u.

For Android, we have disabled LTO by default since 6.1, so this is not an issue for us now. I will do some mitigation on 5.15 by backporting https://lore.kernel.org/lkml/20240429220756.979347-2-elsk@google.com/T/#u .

@samitolvanen and @nathanchance , do you mean that the --thinlto-cache-dir option should be dropped from the build command? https://github.com/torvalds/linux/blob/d03d4188908883e1705987795a09aeed31424f66/Makefile#L945

samitolvanen commented 4 months ago

Yes, I think Nathan proposed dropping --thinlto-cache-dir. Just to clarify, do you see this problem also with LTO_CLANG_FULL or only with ThinLTO? Android uses LTO_CLANG_FULL on 5.15, but this flag is only for ThinLTO.

nathanchance commented 4 months ago

@samitolvanen and @nathanchance , do you mean that the --thinlto-cache-dir option should be dropped from the build command? https://github.com/torvalds/linux/blob/d03d4188908883e1705987795a09aeed31424f66/Makefile#L945

Yes, I think that line should just be deleted to resolve this issue. If people notice the removal and want it brought back, we can always revert it and add the workarounds like the change you sent but if the impact to compile time for incremental builds is not noticeable with or without the cache, I would rather just avoid this problem and any other ones that may surface as a result of caching.

nathanchance commented 4 months ago

I did some benchmarking on my test machine that has an i7-11700 (8c / 16t) in it and came to the conclusion that removing the ThinLTO cache would result in about a 20% performance hit (measured with building a full kernel first then just touching init/main.c).

ARCH=x86_64 defconfig + CONFIG_LTO_CLANG_THIN=y:

Benchmark 1: With ThinLTO cache
  Time (mean ± σ):     85.772 s ±  0.252 s    [User: 91.505 s, System: 8.408 s]
  Range (min … max):   85.447 s … 86.244 s    10 runs

Benchmark 2: Without ThinLTO cache
  Time (mean ± σ):     103.833 s ±  0.288 s    [User: 232.058 s, System: 8.569 s]
  Range (min … max):   103.286 s … 104.124 s    10 runs

Summary
  With ThinLTO cache ran
    1.21 ± 0.00 times faster than Without ThinLTO cache

ARCH=x86_64 allmodconfig + CONFIG_LTO_CLANG_THIN=y:

Benchmark 1: With ThinLTO cache
  Time (mean ± σ):     119.537 s ±  0.243 s    [User: 215.822 s, System: 34.958 s]
  Range (min … max):   118.970 s … 119.780 s    10 runs

Benchmark 2: Without ThinLTO cache
  Time (mean ± σ):     145.668 s ±  0.658 s    [User: 429.416 s, System: 35.013 s]
  Range (min … max):   144.787 s … 146.666 s    10 runs

Summary
  With ThinLTO cache ran
    1.22 ± 0.01 times faster than Without ThinLTO cache

I suspect the difference may be less noticeable on more powerful machines. I think this is an acceptable tradeoff for correctness, especially since I would not expect a developer to require ThinLTO for incremental development.

nathanchance commented 4 months ago

I've submitted my suggested workaround: https://lore.kernel.org/20240501-kbuild-llvm-drop-thinlto-cache-v1-1-c117cc50a24b@kernel.org/

nathanchance commented 4 months ago

Workaround patch has been accepted: https://git.kernel.org/masahiroy/linux-kbuild/c/aba091547ef6159d52471f42a3ef531b7b660ed8

dileks commented 2 months ago

Patch now in mainline: https://git.kernel.org/linus/aba091547ef6159d52471f42a3ef531b7b660ed8

dileks commented 2 months ago

Build-time here on Linux v6.9.7 (includes patch) does NOT matter:

$ cat 6.9.7-*/build-time.txt
 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-2-amd64-clang18-kcfi KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2024-06-30 bindeb-pkg KDEB_PKGVERSION=6.9.7-2~trixie+dileks1':

       63494407.99 msec task-clock                       #    3.525 CPUs utilized             
           9062831      context-switches                 #  142.734 /sec                      
            411788      cpu-migrations                   #    6.485 /sec                      
         416054737      page-faults                      #    6.553 K/sec                     
   110382034030285      cycles                           #    1.738 GHz                       
    80146145626267      stalled-cycles-frontend          #   72.61% frontend cycles idle      
    57218874926432      stalled-cycles-backend           #   51.84% backend cycles idle       
    76939532714144      instructions                     #    0.70  insn per cycle            
                                                  #    1.04  stalled cycles per insn   
    13762387030701      branches                         #  216.750 M/sec                     
      720987778189      branch-misses                    #    5.24% of all branches           

   18013.732280936 seconds time elapsed

   59012.849516000 seconds user
    4461.140162000 seconds sys

 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-3-amd64-clang18-kcfi KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2024-06-30 bindeb-pkg KDEB_PKGVERSION=6.9.7-3~trixie+dileks1':

       63243448.47 msec task-clock                       #    3.502 CPUs utilized             
          10036370      context-switches                 #  158.694 /sec                      
            425935      cpu-migrations                   #    6.735 /sec                      
         416947947      page-faults                      #    6.593 K/sec                     
   110722495867292      cycles                           #    1.751 GHz                       
    80475696588834      stalled-cycles-frontend          #   72.68% frontend cycles idle      
    57515842771175      stalled-cycles-backend           #   51.95% backend cycles idle       
    76957508623789      instructions                     #    0.70  insn per cycle            
                                                  #    1.05  stalled cycles per insn   
    13766924281493      branches                         #  217.681 M/sec                     
      722703988007      branch-misses                    #    5.25% of all branches           

   18058.162038615 seconds time elapsed

   58780.742314000 seconds user
    4429.886564000 seconds sys

But disc-usage looks better - reduced by 5530M with patch:


$ cat 6.9.7-*/disc-usage.txt
31237   /home/dileks/src/linux/git
1432    /home/dileks/src/linux/archives/6.9.7-2-amd64-clang18-kcfi
25707   /home/dileks/src/linux/git
1432    /home/dileks/src/linux/archives/6.9.7-3-amd64-clang18-kcfi