Closed liu-song-6 closed 2 years ago
I also see this seg fault on Fedora with GCC 11.2.1. Looking...
GDB shows:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f18c8e in __strcmp_avx2 () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.186-1.fc34.x86_64 zlib-1.2.11-26.fc34.x86_64
(gdb) bt
#0 0x00007ffff7f18c8e in __strcmp_avx2 () from /lib64/libc.so.6
#1 0x000000000040a0f7 in kpatch_is_core_module_symbol (name=0x0) at create-diff-object.c:3060
#2 0x000000000040a267 in need_dynrela (kelf=0x4669a0, table=0x92af30, sec=0x6d6b20, rela=0x8c7fd0) at create-diff-object.c:3117
#3 0x000000000040a4cc in kpatch_create_intermediate_sections (kelf=0x4669a0, table=0x92af30, objname=0x7fffffffcfc6 "vmlinux", pmod_name=0x7fffffffd020 "livepatch_a") at create-diff-object.c:3281
#4 0x000000000040c7c5 in main (argc=8, argv=0x7fffffffca48) at create-diff-object.c:3931
After a little digging, here's the relocation causing the seg fault, because rela->sym->name is NULL for some reason.
Relocation section [455] '.rela.debug_loclists' for section [454] '.debug_loclists' at offset 0xd0478 contains 2432 entries:
Offset Type Value Addend Name
...
0x000000000000aad6 X86_64_64 000000000000000000 +32 .LC55
That .LC55 symbol corresponds to the beginning of section 104:
428: 0000000000000000 0 NOTYPE LOCAL DEFAULT 104 .LC55
Which is the following section:
[104] .rodata.btf_show_end_aggr_type.str1.8 PROGBITS 0000000000000000 00003ef0 00000021 1 AMS 0 0 8
Which has the following contents:
$ readelf -x .rodata.btf_show_end_aggr_type.str1.8 orig/kernel/bpf/btf.o
Hex dump of section '.rodata.btf_show_end_aggr_type.str1.8':
0x00000000 20202020 20202020 20202020 20202020
0x00000010 20202020 20202020 20202020 20202020
0x00000020 00 .
And notice that the '+32' relocation addend points it at that NULL byte at the end of the section. So it's a NULL string. But that's not actually the problem :-)
The problem is that create-diff-object doesn't include the .LC55 symbol in the output file, even though it includes its corresponding .rodata.btf_show_end_aggr_type.str1.8
section. So it seg faults trying to access the symbol in the output file. That should be an easy fix.
Another oddity is that need_dynrela() is even going that far with a DWARF debug section. It should instead always return false for debug sections, since they should only access symbols in the patch module itself. Another easy fix.
Ha. This issue only happens with unstripped objects because the bad relocation is in a debug section. But our unit test framework requires objects to be stripped, which removes the debug sections. So maybe I can just skip the unit tests for this one ;-)
Ha. This issue only happens with unstripped objects because the bad relocation is in a debug section. But our unit test framework requires objects to be stripped, which removes the debug sections. So maybe I can just skip the unit tests for this one ;-)
Shall we just strip the debug section before running create-diff-object?
We used to, but at least in the past there was a requirement to be able to debug (or run the 'crash' tool) with livepatch modules, in case something goes wrong at runtime.
Got it. Thanks for the information and the quick fix.
We hit a weird segfault on create-diff-object with a simple patch (attached at the end). It happens for gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-3), but not clang. After digging into it a little bit, I think it was caused by the following rela entry.
How should we fix this case?
Thanks, Song
The patch: