Bugzilla Link	PR43290
Status	NEW
Importance	P normal
Reported by	Jeremy Morse (jeremy.morse.llvm@gmail.com)
Reported on	2019-09-12 05:00:13 -0700
Last modified on	2019-09-12 16:24:15 -0700
Version	trunk
Hardware	PC Linux
CC	aprantl@apple.com, cmtice@google.com, dblaikie@gmail.com, jdevlieghere@apple.com, keith.walker@arm.com, llvm-bugs@lists.llvm.org, paul_robinson@playstation.sony.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

It seems that LLVMs location-list reader chokes when there's padding in between lists, which is what GNU LD produces by default when it deletes a COMDAT function. This situation can be replicated with the three files at the bottom of this ticket on a fresh ubuntu 19.04 VM, building them with:

gcc test1.cpp -o test1.o -g -O1 -fno-inline -c -gdwarf-4 -gstrict-dwarf gcc test2.cpp -o test2.o -g -O1 -fno-inline -c -gdwarf-4 -gstrict-dwarf gcc test3.cpp -o test3.o -g -O1 -fno-inline -c -gdwarf-4 -gstrict-dwarf gcc test1.o test2.o test3.o -o a.out -gstrict-dwarf

The code in the test files is meaningless, but crafted to:

generate some location lists,
which are in a COMDAT / Weak function, specifically "a_method",
that get de-duplicated at link time,
and have more location lists after a_method's (test3.cpp).

When linking this, GNU LD (the default on linux) appears to keep the location-lists from the duplicate function in the .debug_loc section, it just nulls the addresses out. Using readelf --debug-dump=loc a.out and trimming the start and end:

--------8<-------- 00000087 00000000000011dd 00000000000011f0 (DW_OP_reg5 (rdi)) 0000009a 000000aa readelf: Warning: There is a hole [0xba - 0xe8] in .debug_loc section. 000000e8 00000000000011f7 000000000000120a (DW_OP_reg5 (rdi)) -------->8--------

The "hole" seems to be a problem for llvm-dwarfdump, I'm using LLVM-8 on the VM but this replicates with trunk. If I run llvm-dwarfdump-8 -debug-loc a.out I get firstly an error message:

error: location list overflows the debug_loc section.
error: failed to consume entire .debug_loc section

And then after the gap some nonsense location lists:

--------8<-------- 0x00000087: [0x00000000000011dd, 0x00000000000011f0): DW_OP_reg5 RDI

0x000000aa:

0x000000ba: [0x0000000000540001, 0x0000000000000000): [0x1414330074000900, 0x000000009f1c1e1b):

0x000000ee: [0x0000000000130000, 0x0000135500010000): [0x000000001c000000, 0x001c510001000000): [0x0000002000000000, 0x2053000100000000): [0x0000210000000000, 0x5000010000000000): -------->8--------

Happily if one runs llvm-dwarfdump-8 a.out --name=b, location lists are read from past the gap without error, so looking up a location list directly still works.

What breaks however is the --statistics option to llvm-dwarfdump, which I've been getting weird numbers out of for a while. It looks up [0] location lists via offset through this [1] API call, which appears to pre-read all location lists and trips over the gap. When fed a binary such as the above, I get the error message, and getLocationListAtOffset fails for some DW_AT_locations. These then get interpreted as a location list fully covering all scope bytes.

I'm not hugely familiar with DWARF, but I nerd-sniped PaulR by asking him about this, and he seemed to reckon there's nothing in the spec that prohibits padding between location lists.

One solution would be "don't link things with LD", but it is still the default in many places.

Neither GOLD nor LLD leave a gap in .debug_loc in this situation.

[0] https://github.com/llvm/llvm-project/blob/c714a88a4dc4dadc16409986a7e275b86142622b/llvm/tools/llvm-dwarfdump/Statistics.cpp#L251 [1] https://github.com/llvm/llvm-project/blob/88b4e28a679a5aaa14ef41a1901d3d24ddd8946b/llvm/lib/DebugInfo/DWARF/DWARFDebugLoc.cpp#L202

Code to replicate this situation below. I think the namespace might be un-necessary.

test1.cpp --------8<--------

include "tmp.h"

int main() { thin::floogie foo; somefunc(foo); return foo.a_method(3); } -------->8--------

test2.cp --------8<--------

include "tmp.h"

void somefunc(thin::floogie &foo) { foo.somemember = 12; foo.a_method(foo.somemember); }

void externfunc() { } -------->8--------

test3.cpp --------8<--------

include

int unrelated(int a, int b) { printf("%d, %d\n", a, b); return a; }; -------->8--------

tmp.h --------8<-------- void externfunc();

namespace thin { class floogie { public: floogie() : somemember(0) { } int somemember; bool a_method(int another) { another += 13; externfunc(); another %= 3; somemember += 3; return somemember + another; } }; }

void somefunc(thin::floogie &bar); -------->8--------

CC'd a couple of folks (Adrian and Caroline) who care about the statistics issue, perhaps. (well, neither probably are too invested in statistics on binutils ld linked binaries - but might be interested in the general issue) This, I /think/ is due to how binutils ld, gold, and lld resolve relocations to discarded sections. Looking at debug_ranges (because it's easier for me to make small test cases that have debug_ranges than it is to make ones that have debug_loc) binutils-ld resolves /all/ relocations to discarded sections to the constant '1'. Even accounting for the addend in the relocation, eg: 00000070 0000000000000001 0000000000000001 00000070 0000000000000001 0000000000000001 00000070 <End of list> 000000a0 0000000000401150 0000000000401155 000000a0 0000000000000001 0000000000000001 000000a0 <End of list> Here we see the range list for a scope inside a discarded function, then the ranges for the CU containing that discarded function (and a non-discarded function) Whereas with gold or lld: 00000070 0000000000000001 000000000000000c 00000070 0000000000000013 0000000000000018 00000070 <End of list> 000000a0 0000000000201130 0000000000201135 000000a0 0000000000000000 000000000000001a 000000a0 <End of list> You can see that gold and lld have simply not applied any relocation here - leaving the addend that was in the relocatable bytes alone (so the address at the start of the section is 0, but any addresses beyond the start have their offset remain) I'm /guessing/... ah, nope. So it looks like binutils ld might be special casing relocations in debug_ranges to be non-zero, but doesn't do the same thing in debug_loc. Instead, in debug_loc it resolves them all to zero, dropping the addend. This causes the list to terminate early - and the following location description gets interpreted as another address range (& then llvm-dwarfdump attempts to apply a relocation to that address, but it's outside the range of the .text section, so that's where you get the "unexpected end of data" error) So... yeah, a few things: * We generally shouldn't need to try to read debug_loc by itself (without using debug_info to know which offsets contain valid starting points for loc lists) - I wouldn't think we'd need to do that for the statistics mode (rather than reading/caching up-front, perhaps we need to lazily read/cache based on the offsets used for queries found in the debug_info section) * When dumping debug_loc without debug_info - not much we can do if it's linked with binutils-ld. We'll terminate early - we could avoid printing the extra error message, but we still can't parse the whole debug_loc section, because it's mangled beyond repair by binutils ld. (but yeah, technically you could have empty regions or other garbage in the section - so long as the locations referenced from debug_info are valid & appropriately terminated - so parsing debug_loc in isolation is never guaranteed to work & this is the sort of place where it doesn't work in practice)

Yep, binutils-ld special cases relocations to text from the debug_ranges section: Contents of section .debug_info: 0000 00000000 00000000 00000000 00000000 ................ Contents of section .debug_abbrev: 0000 00000000 00000000 00000000 00000000 ................ Contents of section .debug_str: 0000 00000000 00000000 00000000 00000000 ................ Contents of section .debug_loc: 0000 00000000 00000000 00000000 00000000 ................ Contents of section .debug_ranges: 0000 01000000 00000000 01000000 00000000 ................ (placing the same two relocations here in each of these debug_ sections (and nothing else) - one to the start and another to the end of a discarded function) And in other sections they're specifically resolving the relocations to zero - even when they have an addend. Whereas gold and lld are writing the addend from the relocation into the location, even though the base address is zero: Contents of section .debug_str: 0000 00000000 00000000 06000000 00000000 ................ Contents of section .debug_abbrev: 0000 00000000 00000000 06000000 00000000 ................ Contents of section .debug_info: 0000 00000000 00000000 06000000 00000000 ................ Contents of section .debug_ranges: 0000 00000000 00000000 06000000 00000000 ................ Contents of section .debug_loc: 0000 00000000 00000000 06000000 00000000 ................ Even though the addend isn't (as I'd assumed) actually stored in those bytes in the object file - it's stored in the relocation record & so gold and lld are making an intentional choice to update those zero bytes. That choice does fix the parsers so they don't see a double-zero and assume it's the end of a list (& then try to read the following location description as an address..)

This isn't high enough on my priority list to fix at the moment - but perhaps it is for someone else. Essentially, we should probably be careful about parsing any part of the DWARF that isn't debug_info without specified offsets parsed from debug_info. That makes dumping non-debug_info sections a bit slow (because we still have to parse debug_info to figure out which parts of them to dump) & makes the error handling trickier (because we can't do the error handling up-front - though there could always be an error in the reference from debug_info (out of bounds reference), so it's not a new error path, as such). To do this completely would require some reworking of DWARFContext and such - with limited value. Most producers don't actually end up with garbage in these debug_* sections, so it'd be academically pure, but rarely important.

Quuxplusone / LLVMBugzillaTest

[DWARF] Padding between location lists confuses list-reader #42260

include "tmp.h"

include "tmp.h"

include