Open Quuxplusone opened 5 years ago
CC'd a couple of folks (Adrian and Caroline) who care about the statistics
issue, perhaps. (well, neither probably are too invested in statistics on
binutils ld linked binaries - but might be interested in the general issue)
This, I /think/ is due to how binutils ld, gold, and lld resolve relocations to
discarded sections.
Looking at debug_ranges (because it's easier for me to make small test cases
that have debug_ranges than it is to make ones that have debug_loc)
binutils-ld resolves /all/ relocations to discarded sections to the constant
'1'. Even accounting for the addend in the relocation, eg:
00000070 0000000000000001 0000000000000001
00000070 0000000000000001 0000000000000001
00000070 <End of list>
000000a0 0000000000401150 0000000000401155
000000a0 0000000000000001 0000000000000001
000000a0 <End of list>
Here we see the range list for a scope inside a discarded function, then the
ranges for the CU containing that discarded function (and a non-discarded
function)
Whereas with gold or lld:
00000070 0000000000000001 000000000000000c
00000070 0000000000000013 0000000000000018
00000070 <End of list>
000000a0 0000000000201130 0000000000201135
000000a0 0000000000000000 000000000000001a
000000a0 <End of list>
You can see that gold and lld have simply not applied any relocation here -
leaving the addend that was in the relocatable bytes alone (so the address at
the start of the section is 0, but any addresses beyond the start have their
offset remain)
I'm /guessing/... ah, nope.
So it looks like binutils ld might be special casing relocations in
debug_ranges to be non-zero, but doesn't do the same thing in debug_loc.
Instead, in debug_loc it resolves them all to zero, dropping the addend. This
causes the list to terminate early - and the following location description
gets interpreted as another address range (& then llvm-dwarfdump attempts to
apply a relocation to that address, but it's outside the range of the .text
section, so that's where you get the "unexpected end of data" error)
So... yeah, a few things:
* We generally shouldn't need to try to read debug_loc by itself (without using
debug_info to know which offsets contain valid starting points for loc lists) -
I wouldn't think we'd need to do that for the statistics mode (rather than
reading/caching up-front, perhaps we need to lazily read/cache based on the
offsets used for queries found in the debug_info section)
* When dumping debug_loc without debug_info - not much we can do if it's linked
with binutils-ld. We'll terminate early - we could avoid printing the extra
error message, but we still can't parse the whole debug_loc section, because
it's mangled beyond repair by binutils ld. (but yeah, technically you could
have empty regions or other garbage in the section - so long as the locations
referenced from debug_info are valid & appropriately terminated - so parsing
debug_loc in isolation is never guaranteed to work & this is the sort of place
where it doesn't work in practice)
Yep, binutils-ld special cases relocations to text from the debug_ranges
section:
Contents of section .debug_info:
0000 00000000 00000000 00000000 00000000 ................
Contents of section .debug_abbrev:
0000 00000000 00000000 00000000 00000000 ................
Contents of section .debug_str:
0000 00000000 00000000 00000000 00000000 ................
Contents of section .debug_loc:
0000 00000000 00000000 00000000 00000000 ................
Contents of section .debug_ranges:
0000 01000000 00000000 01000000 00000000 ................
(placing the same two relocations here in each of these debug_ sections (and
nothing else) - one to the start and another to the end of a discarded function)
And in other sections they're specifically resolving the relocations to zero -
even when they have an addend.
Whereas gold and lld are writing the addend from the relocation into the
location, even though the base address is zero:
Contents of section .debug_str:
0000 00000000 00000000 06000000 00000000 ................
Contents of section .debug_abbrev:
0000 00000000 00000000 06000000 00000000 ................
Contents of section .debug_info:
0000 00000000 00000000 06000000 00000000 ................
Contents of section .debug_ranges:
0000 00000000 00000000 06000000 00000000 ................
Contents of section .debug_loc:
0000 00000000 00000000 06000000 00000000 ................
Even though the addend isn't (as I'd assumed) actually stored in those bytes in
the object file - it's stored in the relocation record & so gold and lld are
making an intentional choice to update those zero bytes.
That choice does fix the parsers so they don't see a double-zero and assume
it's the end of a list (& then try to read the following location description
as an address..)
This isn't high enough on my priority list to fix at the moment - but perhaps
it is for someone else.
Essentially, we should probably be careful about parsing any part of the DWARF
that isn't debug_info without specified offsets parsed from debug_info.
That makes dumping non-debug_info sections a bit slow (because we still have to
parse debug_info to figure out which parts of them to dump) & makes the error
handling trickier (because we can't do the error handling up-front - though
there could always be an error in the reference from debug_info (out of bounds
reference), so it's not a new error path, as such).
To do this completely would require some reworking of DWARFContext and such -
with limited value. Most producers don't actually end up with garbage in these
debug_* sections, so it'd be academically pure, but rarely important.
It seems that LLVMs location-list reader chokes when there's padding in between lists, which is what GNU LD produces by default when it deletes a COMDAT function. This situation can be replicated with the three files at the bottom of this ticket on a fresh ubuntu 19.04 VM, building them with:
gcc test1.cpp -o test1.o -g -O1 -fno-inline -c -gdwarf-4 -gstrict-dwarf gcc test2.cpp -o test2.o -g -O1 -fno-inline -c -gdwarf-4 -gstrict-dwarf gcc test3.cpp -o test3.o -g -O1 -fno-inline -c -gdwarf-4 -gstrict-dwarf gcc test1.o test2.o test3.o -o a.out -gstrict-dwarf
The code in the test files is meaningless, but crafted to:
When linking this, GNU LD (the default on linux) appears to keep the location-lists from the duplicate function in the .debug_loc section, it just nulls the addresses out. Using
readelf --debug-dump=loc a.out
and trimming the start and end:--------8<-------- 00000087 00000000000011dd 00000000000011f0 (DW_OP_reg5 (rdi)) 0000009a
000000aa
readelf: Warning: There is a hole [0xba - 0xe8] in .debug_loc section.
000000e8 00000000000011f7 000000000000120a (DW_OP_reg5 (rdi))
-------->8--------
The "hole" seems to be a problem for llvm-dwarfdump, I'm using LLVM-8 on the VM but this replicates with trunk. If I run
llvm-dwarfdump-8 -debug-loc a.out
I get firstly an error message:And then after the gap some nonsense location lists:
--------8<-------- 0x00000087: [0x00000000000011dd, 0x00000000000011f0): DW_OP_reg5 RDI
0x000000aa:
0x000000ba: [0x0000000000540001, 0x0000000000000000): [0x1414330074000900, 0x000000009f1c1e1b):
0x000000ee: [0x0000000000130000, 0x0000135500010000): [0x000000001c000000, 0x001c510001000000): [0x0000002000000000, 0x2053000100000000): [0x0000210000000000, 0x5000010000000000): -------->8--------
Happily if one runs
llvm-dwarfdump-8 a.out --name=b
, location lists are read from past the gap without error, so looking up a location list directly still works.What breaks however is the --statistics option to llvm-dwarfdump, which I've been getting weird numbers out of for a while. It looks up [0] location lists via offset through this [1] API call, which appears to pre-read all location lists and trips over the gap. When fed a binary such as the above, I get the error message, and getLocationListAtOffset fails for some DW_AT_locations. These then get interpreted as a location list fully covering all scope bytes.
~
I'm not hugely familiar with DWARF, but I nerd-sniped PaulR by asking him about this, and he seemed to reckon there's nothing in the spec that prohibits padding between location lists.
One solution would be "don't link things with LD", but it is still the default in many places.
Neither GOLD nor LLD leave a gap in .debug_loc in this situation.
[0] https://github.com/llvm/llvm-project/blob/c714a88a4dc4dadc16409986a7e275b86142622b/llvm/tools/llvm-dwarfdump/Statistics.cpp#L251 [1] https://github.com/llvm/llvm-project/blob/88b4e28a679a5aaa14ef41a1901d3d24ddd8946b/llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp#L202
Code to replicate this situation below. I think the namespace might be un-necessary.
test1.cpp --------8<--------
include "tmp.h"
int main() { thin::floogie foo; somefunc(foo); return foo.a_method(3); } -------->8--------
test2.cp --------8<--------
include "tmp.h"
void somefunc(thin::floogie &foo) { foo.somemember = 12; foo.a_method(foo.somemember); }
void externfunc() { } -------->8--------
test3.cpp --------8<--------
include
int unrelated(int a, int b) { printf("%d, %d\n", a, b); return a; }; -------->8--------
tmp.h --------8<-------- void externfunc();
namespace thin { class floogie { public: floogie() : somemember(0) { } int somemember; bool a_method(int another) { another += 13; externfunc(); another %= 3; somemember += 3; return somemember + another; } }; }
void somefunc(thin::floogie &bar); -------->8--------