ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

Cannot find symbol for section 2: .text. #981

Open E5ten opened 4 years ago

E5ten commented 4 years ago

Using AS=clang to build with integrated-as, on x86_64, when scripts/recordmcount is run on certain objects (for me it happens with init/initramfs.o and kernel/elfcore.o at least) I get the error in the title.

nathanchance commented 4 years ago

Steps to reproduce (from #986):

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

$ cd linux

$ curl -LSs https://gist.github.com/nathanchance/171b7d672e311b56b4329821b8a43acd/raw/9a1dbb1f11552d0b6efec48ac29505dd0c768d1b/20200401_jpoimboe_objtool_fixes.mbx | git apply -3v

$ curl -LSs https://lore.kernel.org/lkml/20200325231250.99205-1-ndesaulniers@google.com/raw | git apply -3v

$ ./scripts/config --file arch/x86/configs/x86_64_defconfig -e FUNCTION_TRACER

$ make -j$(nproc) -s LLVM=1 LLVM_IAS=1 O=out/x86_64 distclean defconfig bzImage
E5ten commented 4 years ago

I did an integrated-as build and specifically added CFLAGS_.o += -no-integrated-as to the relevant Makefile's for init/initramfs.o and kernel/elfcore.o, and got through the rest of the build, so at least for my configuration, those are the only 2 objects this issue happens with.

E5ten commented 4 years ago

I assume something like this also needs to be done for recordmcount to fix this? https://lore.kernel.org/lkml/9a9cae7fcf628843aabe5a086b1a3c5bf50f42e8.1585761021.git.jpoimboe@redhat.com/

dileks commented 4 years ago

Just to clarify: You use here LLVM_IAS=1 together with LLVM=1.

E5ten commented 4 years ago

yeah.

dileks commented 4 years ago

@E5ten

I switched over to use LLVM_IAS=1 together with LLVM=1.

samitolvanen commented 3 years ago

I also ran into this with LLVM_IAS=1 when building x86_64 defconfig with dynamic ftrace. Testing Peter's objtool mcount patch, I noticed that objtool segfaults for several object files because the files are missing STT_SECTION symbols for some of the sections.

A random example, compiled with LLVM_IAS=1:

$ readelf --sections arch/x86/mm/hugetlbpage.o | grep PROGBITS
  [ 2] .text             PROGBITS         0000000000000000  00000240
  [ 4] .altinstructions  PROGBITS         0000000000000000  000007c8
  [ 6] .altinstr_re[...] PROGBITS         0000000000000000  00000890
  [ 8] .altinstr_aux     PROGBITS         0000000000000000  000008d0
  [10] .init.text        PROGBITS         0000000000000000  00000988
...
$ readelf --symbols arch/x86/mm/hugetlbpage.o | grep SECTION
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 

Objtool fails here because .init.text doesn't have a corresponding STT_SECTION symbol. Without IAS, the symbol is generated:

$ readelf --sections arch/x86/mm/hugetlbpage.o | grep PROGBITS
  [ 1] .text             PROGBITS         0000000000000000  00000040
  [ 3] .data             PROGBITS         0000000000000000  000005c8
  [ 5] .altinstructions  PROGBITS         0000000000000000  000005c8
  [ 7] .altinstr_re[...] PROGBITS         0000000000000000  00000690
  [ 9] .altinstr_aux     PROGBITS         0000000000000000  000006d0
  [11] .init.text        PROGBITS         0000000000000000  00000788
...
$ readelf --symbols arch/x86/mm/hugetlbpage.o | grep SECTION
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    9 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT   11 
... 

Edit: OK, my issue looks similar to issue #669, but just in a different part of objtool. Specifically, the new static call processing code and the proposed mcount patch both depend on section symbols, so if either of these occur in a section for which a symbol is missing, objtool is going to segfault. This doesn't appear to be a problem with static calls right now (or we would have noticed it), but the mcount patch triggers this quite often. I fixed this in commit 54d837e5119bd5a15593820ca1585ca4e4f3e2a4 for now.

nickdesaulniers commented 3 years ago

It sounds like CrOS is hitting this now trying to move to LLVM_IAS=1: https://bugs.chromium.org/p/chromium/issues/detail?id=1148073 cc @jcai19

nickdesaulniers commented 3 years ago

With defconfig+FUNCTION_TRACER, I see this in:

init/initramfs.o kernel/elfcore.o

Sami, I think https://github.com/ClangBuiltLinux/linux/commit/54d837e5119bd5a15593820ca1585ca4e4f3e2a4 no longer applies on linux-next?

samitolvanen commented 3 years ago

Sami, I think 54d837e no longer applies on linux-next?

That's because it only fixes the mcount pass (commit 0271fa5f8566b79f07c905922321ecc70b697b4c), which isn't upstream yet. You probably need an identical fix for the static call pass instead, assuming that's where it crashes.

jcai19 commented 3 years ago

Sami, I think 54d837e no longer applies on linux-next?

That's because it only fixes the mcount pass (commit 0271fa5), which isn't upstream yet.

May I know what dependencies are needed to back port https://github.com/ClangBuiltLinux/linux/commit/0271fa5f8566b79f07c905922321ecc70b697b4c and https://github.com/ClangBuiltLinux/linux/commit/54d837e5119bd5a15593820ca1585ca4e4f3e2a4 into 5.4? While trying to test them on 5.4, I realized there were many dependencies I needed to cherry-pick/back-port in order to apply these two patches cleanly. For example, https://github.com/ClangBuiltLinux/linux/commit/0271fa5f8566b79f07c905922321ecc70b697b4c seems to be based on upstream commit 0f1441b44e823a74f3f3780902a113e07c73fbf6, which is not in 5.4 yet, but I could not cherry-pick it into stable/linux-5.4.y branch cleanly as its dependencies were also missing.

You probably need an identical fix for the static call pass instead, assuming that's where it crashes.

Just to be clear, does that mean https://github.com/ClangBuiltLinux/linux/commit/0271fa5f8566b79f07c905922321ecc70b697b4c and https://github.com/ClangBuiltLinux/linux/commit/54d837e5119bd5a15593820ca1585ca4e4f3e2a4 are not enough to fix this issue? Thanks.

samitolvanen commented 3 years ago

Just to be clear, does that mean 0271fa5 and 54d837e are not enough to fix this issue? Thanks.

After actually looking at the CrOS bug, I'm guessing it's the same as the original recordmcount issue and these objtool patches are not going to help here. Both issues have the same root cause though, Clang not always generating section symbols, but you'll need to fix this in recordmcount instead.

nickdesaulniers commented 3 years ago

I think @arndb just sent patches for this that got picked up by akpm: https://lore.kernel.org/lkml/20201204165742.3815221-1-arnd@kernel.org/

arndb commented 3 years ago

The patches I sent just work around the problem by avoiding the weak functions in those files, the bug is still there and could show up any time another file has only weak functions in the .text section.

E5ten commented 3 years ago

With these patches I was able to build and boot an x86_64 kernel with LLVM=1 and LLVM_IAS=1

dileks commented 3 years ago

Both patches in Linux v5.10 and linux-stable trees recently carrying them.

$ git log --oneline | grep 'initramfs: fix clang build failure'
55d5b7dd6451 initramfs: fix clang build failure
$ git describe --contains 55d5b7dd6451
v5.10~14^2~3

$ git log --oneline | grep 'elfcore: fix building with clang'
6e7b64b9dd6d elfcore: fix building with clang
$ git describe --contains 6e7b64b9dd6d
v5.10~14^2~2
nathanchance commented 2 years ago

Looks like the PowerPC folks are getting bit by this too:

https://github.com/linuxppc/issues/issues/388

https://lore.kernel.org/r/cd0f6bdfdf1ee096fb2c07e7b38940921b8e9118.1637764848.git.christophe.leroy@csgroup.eu/

@emojifreak reported issues with ARCH=mips allmodconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y:

$ echo "CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
CONFIG_MIPS32_O32=n" >>kernel/configs/repro.config

$ make -skj"$(nproc)" ARCH=mips LLVM=1 distclean allmodconfig repro.config init/calibrate.o
...
Cannot find symbol for section 8: .text.calibrate_delay_is_known.
init/calibrate.o: failed
...

KCOV helps reproduce it but I doubt it is strictly related to the issue. cvise spits out:

$ cat calibrate.i
long __attribute__((weak)) calibrate_delay_is_known() { return 0; }

$ clang --target=mipsel-linux-gnu -fsanitize-coverage=trace-pc -ffunction-sections -c calibrate.i

$ ./recordmcount calibrate.o
Cannot find symbol for section 4: .text.calibrate_delay_is_known.
calibrate.o: failed

$ llvm-objdump -x calibrate.o

calibrate.o:    file format elf32-mips
architecture: mipsel
start address: 0x00000000

Program Header:

Dynamic Section:

Sections:
Idx Name                               Size     VMA      Type
  0                                    00000000 00000000
  1 .strtab                            000000c0 00000000
  2 .text                              00000000 00000000 TEXT
  3 .mdebug.abi32                      00000000 00000000
  4 .text.calibrate_delay_is_known     00000034 00000000 TEXT
  5 .rel.text.calibrate_delay_is_known 00000008 00000000
  6 .pdr                               00000020 00000000
  7 .rel.pdr                           00000008 00000000
  8 .comment                           00000016 00000000
  9 .note.GNU-stack                    00000000 00000000
 10 .data                              00000000 00000000 DATA
 11 .bss                               00000000 00000000 BSS
 12 .reginfo                           00000018 00000000
 13 .MIPS.abiflags                     00000018 00000000
 14 .llvm_addrsig                      00000001 00000000
 15 .symtab                            00000040 00000000

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 calibrate.i
00000000  w    F .text.calibrate_delay_is_known 00000034 calibrate_delay_is_known
00000000         *UND*  00000000 __sanitizer_cov_trace_pc

RELOCATION RECORDS FOR [.text.calibrate_delay_is_known]:
OFFSET   TYPE                     VALUE
00000010 R_MIPS_26                __sanitizer_cov_trace_pc

RELOCATION RECORDS FOR [.pdr]:
OFFSET   TYPE                     VALUE
00000000 R_MIPS_32                calibrate_delay_is_known

Without -fsanitize-coverage=trace-pc:

$ clang --target=mipsel-linux-gnu -ffunction-sections -c calibrate.i

$ ./recordmcount calibrate.o

$ llvm-objdump -x calibrate.o

calibrate.o:    file format elf32-mips
architecture: mipsel
start address: 0x00000000

Program Header:

Dynamic Section:

Sections:
Idx Name                           Size     VMA      Type
  0                                00000000 00000000
  1 .strtab                        000000a3 00000000
  2 .text                          00000000 00000000 TEXT
  3 .mdebug.abi32                  00000000 00000000
  4 .text.calibrate_delay_is_known 0000002c 00000000 TEXT
  5 .pdr                           00000020 00000000
  6 .rel.pdr                       00000008 00000000
  7 .comment                       00000016 00000000
  8 .note.GNU-stack                00000000 00000000
  9 .data                          00000000 00000000 DATA
 10 .bss                           00000000 00000000 BSS
 11 .reginfo                       00000018 00000000
 12 .MIPS.abiflags                 00000018 00000000
 13 .llvm_addrsig                  00000000 00000000
 14 .symtab                        00000030 00000000

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 calibrate.i
00000000  w    F .text.calibrate_delay_is_known 0000002c calibrate_delay_is_known

RELOCATION RECORDS FOR [.pdr]:
OFFSET   TYPE                     VALUE
00000000 R_MIPS_32                calibrate_delay_is_known
nathanchance commented 1 year ago

There is a new instance of this problem after commit dbe69b299884 ("bpf: Fix dispatcher patchable function entry to 5 bytes nop") for certain configurations:

$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- LLVM=1 mrproper powernv_defconfig all
Cannot find symbol for section 4: .init.text.
kernel/bpf/dispatcher.o: failed
nickdesaulniers commented 1 year ago

https://github.com/linuxppc/issues/issues/388 alludes to this issue. Looks like binutils reverted dropping section symbols just for ppc: https://github.com/bminor/binutils-gdb/commit/c09c8b42021180eee9495bd50d8b35e683d3901b cc @MaskRay

nathanchance commented 1 year ago

That's annoying :/ for what it's worth, I have seen that error on i386 as well, so it is not just powerpc that is affected by this.

I think recordmcount is only run for ftrace so maybe a diff like this would help out?

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e9e95c790b8e..233836893fd8 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -744,6 +744,7 @@ config FTRACE_MCOUNT_USE_RECORDMCOUNT
        depends on !FTRACE_MCOUNT_USE_CC
        depends on !FTRACE_MCOUNT_USE_OBJTOOL
        depends on FTRACE_MCOUNT_RECORD
+       depends on !AS_IS_LLVM

 config TRACING_MAP
        bool
nathanchance commented 1 year ago

While that diff stops the build error because it disables the use of recordmcount, it does not prevent ftrace from being selected altogether, which may lead to further reports of ftrace not working, despite being selected. We might be able to fix that error in a similar manner as Arnd's previous patches but I am not sure how to go about that...

nathanchance commented 1 year ago

I am not sure how to go about that...

More specifically, I only tried removing __init from bpf_arch_init_dispatcher_early() in kernel/bpf/dispatcher.c but that is not enough since the declaration in include/linux/bpf.h wins. We cannot remove __init altogether as the x86 version of bpf_arch_init_dispatcher_early() calls text_poke_early(), which is marked __init_or_module, which expands to nothing if CONFIG_MODULES is enabled or __init if not. With that in mind, the following diff resolves the failure that I note above for that specific configuration; so far, I have only seen that failure in three different configurations. It will still be reproducible with CONFIG_MODULES disabled but that is probably okay for now. I can send this as a formal patch on Monday if it seems reasonable.

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 00127abd89ee..4145939bbb6a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -389,7 +389,7 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
        return ret;
 }

-int __init bpf_arch_init_dispatcher_early(void *ip)
+int __init_or_module bpf_arch_init_dispatcher_early(void *ip)
 {
        const u8 *nop_insn = x86_nops[5];

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0566705c1d4e..4aa7bde406f5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -971,7 +971,7 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key,
                                          struct bpf_attach_target_info *tgt_info);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
 int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs);
-int __init bpf_arch_init_dispatcher_early(void *ip);
+int __init_or_module bpf_arch_init_dispatcher_early(void *ip);

 #define BPF_DISPATCHER_INIT(_name) {                           \
        .mutex = __MUTEX_INITIALIZER(_name.mutex),              \
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index 04f0a045dcaa..e14a68e9a74f 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -91,7 +91,7 @@ int __weak arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int n
        return -ENOTSUPP;
 }

-int __weak __init bpf_arch_init_dispatcher_early(void *ip)
+int __weak __init_or_module bpf_arch_init_dispatcher_early(void *ip)
 {
        return -ENOTSUPP;
 }
nathanchance commented 1 year ago

Patch submitted: https://lore.kernel.org/20221031173819.2344270-1-nathan@kernel.org/

nathanchance commented 1 year ago

It sounds like the original patch that caused the recent bpf issue might get reverted in favor of a difference fix:

https://lore.kernel.org/Y2DRVwI4bNUppmXJ@krava/

https://lore.kernel.org/87iljyyes6.fsf@all.your.base.are.belong.to.us/

arndb commented 1 year ago

Sent a fix for another instance of this problem: https://lore.kernel.org/lkml/20230414080418.110236-1-arnd@kernel.org/T/#u