davidlattimore / wild

Apache License 2.0
556 stars 14 forks source link

Wrong caching for debug info in get_merged_string_output_address since `8e608131778e` #158

Closed marxin closed 1 month ago

marxin commented 1 month ago

I noticed that while preparing a gimli verification check for linker-diff. If I build a simple C program, I get the following with Wild:

$ eu-readelf -w a.out
...
DWARF section [23] '.debug_info' at offset 0x1180:
 [Offset]
 Compilation unit at offset 0:
 Version: 5, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 Unit type: compile (1)
 [     c]  compile_unit         abbrev: 1
           stmt_list            (sec_offset) 0
           low_pc               (addr) 0x0000000000401d00 <_start>
           high_pc              (udata) 38 (0x0000000000401d26)
           name                 (strp) "../sysdeps/x86_64/start.S"
           comp_dir             (strp) "../sysdeps/x86_64/start.S"
           producer             (strp) "../sysdeps/x86_64/start.S"
           language             (data2) Mips_Assembler (32769)
 [    28]    subprogram           abbrev: 2
             name                 (strp) "../sysdeps/x86_64/start.S"
             external             (flag_present) yes
             type                 (ref_udata) [    37]
             low_pc               (addr) 0x0000000000401d00 <_start>
             high_pc              (udata) 38 (0x0000000000401d26)
 [    37]    unspecified_type     abbrev: 3

while the correct output should look like:

DWARF section [23] '.debug_info' at offset 0x1180:
 [Offset]
 Compilation unit at offset 0:
 Version: 5, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 Unit type: compile (1)
 [     c]  compile_unit         abbrev: 1
           stmt_list            (sec_offset) 0
           low_pc               (addr) 0x0000000000401d00 <_start>
           high_pc              (udata) 38 (0x0000000000401d26)
           name                 (strp) "../sysdeps/x86_64/start.S"
           comp_dir             (strp) "/home/abuild/rpmbuild/BUILD/glibc-2.40/csu"
           producer             (strp) "GNU AS 2.42.0"
           language             (data2) Mips_Assembler (32769)
 [    28]    subprogram           abbrev: 2
             name                 (strp) "_start"
             external             (flag_present) yes
             type                 (ref_udata) [    37]
             low_pc               (addr) 0x0000000000401d00 <_start>
             high_pc              (udata) 38 (0x0000000000401d26)
 [    37]    unspecified_type     abbrev: 3

The problem is that the cache is now one for each output section. However, it should be per output section part.

@davidlattimore Will you fix it, please? In the meantime, I plan to integrate a basic gimli checker to handle this problem.

davidlattimore commented 1 month ago

Thanks for noticing that! You're right. However using PartId rather than OutputSectionId won't help, since for cases where we're doing string merging, alignment is always 1-1, so we only have a single string-merge PartId in the output section. There can however be multiple string-merge input sections from the same input file that map to the same PartId / OutputSectionId. I've switched to keying by the offset in the input file rather than the offset in the input section. That removes the need for OutputSectionId.

I diffed a dump of the debug info for a trivial C program with and without the fix and didn't see any change, but maybe I just got luck (or unlucky) and didn't get any collisions.