dynup / kpatch

kpatch - live kernel patching
GNU General Public License v2.0
1.49k stars 305 forks source link

create-diff-object segfault #1257

Closed liu-song-6 closed 2 years ago

liu-song-6 commented 2 years ago

We hit a weird segfault on create-diff-object with a simple patch (attached at the end). It happens for gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-3), but not clang. After digging into it a little bit, I think it was caused by the following rela entry.

readelf -W -r btf.o
...
0000000000025342  0000000c00000001 R_X86_64_64            0000000000000000 .text.btf_show_name + 0
000000000002534a  0000000c00000001 R_X86_64_64            0000000000000000 .text.btf_show_name + e0
0000000000025355  000001df00000001 R_X86_64_64            000000000000000c .LC3 + 14            <<<   this one
000000000002535e  0000000c00000001 R_X86_64_64            0000000000000000 .text.btf_show_name + e0
0000000000025366  0000000c00000001 R_X86_64_64            0000000000000000 .text.btf_show_name + 1c4

How should we fix this case?

Thanks, Song

The patch:

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7a7be8c057f2..5fc160288fa0 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -566,9 +566,11 @@ static inline u32 type_flag(u32 type)
        return type & ~BPF_BASE_TYPE_MASK;
 }

+/* only use after check_attach_btf_id() */
 static inline enum bpf_prog_type resolve_prog_type(struct bpf_prog *prog)
 {
-       return prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type;
+       return prog->type == BPF_PROG_TYPE_EXT ?
+               prog->aux->dst_prog->type : prog->type;
 }

 #endif /* _LINUX_BPF_VERIFIER_H */
jpoimboe commented 2 years ago

I also see this seg fault on Fedora with GCC 11.2.1. Looking...

jpoimboe commented 2 years ago

GDB shows:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f18c8e in __strcmp_avx2 () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.186-1.fc34.x86_64 zlib-1.2.11-26.fc34.x86_64
(gdb) bt
#0  0x00007ffff7f18c8e in __strcmp_avx2 () from /lib64/libc.so.6
#1  0x000000000040a0f7 in kpatch_is_core_module_symbol (name=0x0) at create-diff-object.c:3060
#2  0x000000000040a267 in need_dynrela (kelf=0x4669a0, table=0x92af30, sec=0x6d6b20, rela=0x8c7fd0) at create-diff-object.c:3117
#3  0x000000000040a4cc in kpatch_create_intermediate_sections (kelf=0x4669a0, table=0x92af30, objname=0x7fffffffcfc6 "vmlinux", pmod_name=0x7fffffffd020 "livepatch_a") at create-diff-object.c:3281
#4  0x000000000040c7c5 in main (argc=8, argv=0x7fffffffca48) at create-diff-object.c:3931

After a little digging, here's the relocation causing the seg fault, because rela->sym->name is NULL for some reason.

Relocation section [455] '.rela.debug_loclists' for section [454] '.debug_loclists' at offset 0xd0478 contains 2432 entries:
  Offset              Type            Value               Addend Name
...
  0x000000000000aad6  X86_64_64       000000000000000000     +32 .LC55                                                         

That .LC55 symbol corresponds to the beginning of section 104:

  428: 0000000000000000      0 NOTYPE  LOCAL  DEFAULT      104 .LC55

Which is the following section:

[104] .rodata.btf_show_end_aggr_type.str1.8 PROGBITS     0000000000000000 00003ef0 00000021  1 AMS    0   0  8

Which has the following contents:

$ readelf -x .rodata.btf_show_end_aggr_type.str1.8 orig/kernel/bpf/btf.o

Hex dump of section '.rodata.btf_show_end_aggr_type.str1.8':
  0x00000000 20202020 20202020 20202020 20202020                 
  0x00000010 20202020 20202020 20202020 20202020                 
  0x00000020 00                                  .

And notice that the '+32' relocation addend points it at that NULL byte at the end of the section. So it's a NULL string. But that's not actually the problem :-)

The problem is that create-diff-object doesn't include the .LC55 symbol in the output file, even though it includes its corresponding .rodata.btf_show_end_aggr_type.str1.8 section. So it seg faults trying to access the symbol in the output file. That should be an easy fix.

Another oddity is that need_dynrela() is even going that far with a DWARF debug section. It should instead always return false for debug sections, since they should only access symbols in the patch module itself. Another easy fix.

jpoimboe commented 2 years ago

Ha. This issue only happens with unstripped objects because the bad relocation is in a debug section. But our unit test framework requires objects to be stripped, which removes the debug sections. So maybe I can just skip the unit tests for this one ;-)

liu-song-6 commented 2 years ago

Ha. This issue only happens with unstripped objects because the bad relocation is in a debug section. But our unit test framework requires objects to be stripped, which removes the debug sections. So maybe I can just skip the unit tests for this one ;-)

Shall we just strip the debug section before running create-diff-object?

jpoimboe commented 2 years ago

We used to, but at least in the past there was a requirement to be able to debug (or run the 'crash' tool) with livepatch modules, in case something goes wrong at runtime.

liu-song-6 commented 2 years ago

Got it. Thanks for the information and the quick fix.