gimli-rs / gimli

A library for reading and writing the DWARF debugging format
https://docs.rs/gimli/
Apache License 2.0
857 stars 109 forks source link

Workaround for weird [DW_OP_deref, DW_OP_stack_value] sequences #738

Closed al13n321 closed 4 months ago

al13n321 commented 4 months ago

I've seen this situation a few times (clang-18, x64, Linux): variable's type is 24-byte struct (std::vector), but its location expression ends with DW_OP_deref, DW_OP_stack_value. I.e. the expression tells us that the value of a 24-byte struct is the result of reading 8 bytes (where 8 is address size) from memory; that's indeed what gimli does. I couldn't figure out what's LLVM intended when emitting such expression.

This PR adds a workaround for this: if the expression ends with [DW_OP_deref, DW_OP_stack_value], pretend those two instructions are not there. I.e. assume the whole value is available at the address that would have been dereferenced.

(If this workaround doesn't belong in gimli, that's ok, I can just do the same in my code instead. Merge this only if it seems useful for other users.)


Appendix: example of clang output with this problem.

Here's debug info about a variable (produced by clang-18, x64, Linux):

% llvm-dwarfdump-18 --debug-info=0x3fd6660c ~/2ClickHouse/build/programs/clickhouse
/home/ubuntu/2ClickHouse/build/programs/clickhouse:     file format elf64-x86-64

.debug_info contents:

0x3fd6660c: DW_TAG_formal_parameter
              DW_AT_location    (indexed (0x189) loclist = 0x11a72d60: 
                 [0x0000000011db5bc0, 0x0000000011db5c1c): DW_OP_breg4 RSI+0
                 [0x0000000011db5c1c, 0x0000000011db5fd2): DW_OP_breg6 RBP-248, DW_OP_deref_size 0x8, DW_OP_deref, DW_OP_stack_value)
              DW_AT_name        ("nodes")
              DW_AT_decl_file   ("./build/./src/Storages/MergeTree/KeyCondition.cpp")
              DW_AT_decl_line   (699)
              DW_AT_type        (0x3fccef28 "NodeRawConstPtrs")

% llvm-dwarfdump-18 --debug-info=0x3fccef28 ~/2ClickHouse/build/programs/clickhouse
/home/ubuntu/2ClickHouse/build/programs/clickhouse:     file format elf64-x86-64

.debug_info contents:

0x3fccef28: DW_TAG_typedef
              DW_AT_type        (0x3fcfd5fd "std::__1::vector<const DB::ActionsDAG::Node *, std::__1::allocator<const DB::ActionsDAG::Node *> >")
              DW_AT_name        ("NodeRawConstPtrs")
              DW_AT_accessibility       (DW_ACCESS_public)
              DW_AT_decl_file   ("./build/./src/Interpreters/ActionsDAG.h")
              DW_AT_decl_line   (66)

% llvm-dwarfdump-18 --debug-info=0x3fcfd5fd ~/2ClickHouse/build/programs/clickhouse
/home/ubuntu/2ClickHouse/build/programs/clickhouse:     file format elf64-x86-64

.debug_info contents:

0x3fcfd5fd: DW_TAG_class_type
              DW_AT_calling_convention  (DW_CC_pass_by_reference)
              DW_AT_name        ("vector<const DB::ActionsDAG::Node *, std::__1::allocator<const DB::ActionsDAG::Node *> >")
              DW_AT_byte_size   (0x18)
              DW_AT_decl_file   ("./build/./contrib/llvm-project/libcxx/include/vector")
              DW_AT_decl_line   (341)

So the type is a 24-byte struct.

The first location is DW_OP_breg4 RSI+0, which makes sense as this variable is the first argument of the function. DW_OP_breg4 pushes rsi value onto the dwarf stack, then, by convention, the final value at top of the stack is the address of the variable, i.e. &nodes.

The second location starts at pc 0x0000000011db5c1c. The instruction just before that is:

mov [rbp-0F8h],rsi

So, the address of the struct is written to the stack at [rbp-0F8h] (0F8h = 248), and then the location in dwarf changes. Makes sense.

But the new location DW_OP_breg6 RBP-248, DW_OP_deref_size 0x8, DW_OP_deref, DW_OP_stack_value seems to say:

  1. DW_OP_breg6 RBP-248 - push RBP-248 (aka rbp-0f8h) to the dwarf stack. We know that [rbp-0F8h] is the address of the struct, so [rbp-0F8h] is address of address, &&nodes. Makes sense.
  2. DW_OP_deref_size 0x8 - dereference it, placing [rbp-0f8h] at the top of dwarf stack. That's &nodes. Makes sense.
  3. Dereference it again? As an 8-byte value??
  4. Report that 8-byte value to be the value of the struct. But it's only the first 8 bytes of the struct!
philipc commented 4 months ago

Can lldb and/or gdb correctly display the parameter?

I think we need to understand exactly what is going on before doing a workaround like this. I'll spend some time on it later.

al13n321 commented 4 months ago

lldb: (DB::ActionsDAG::NodeRawConstPtrs) nodes = <extracting data from value failed> gdb:

nodes = {
  __begin_ = 0x7ffe90cc2088,
  __end_ = 0x0,
  __end_cap_ = {
    <std::__1::__compressed_pair_elem<DB::ActionsDAG::Node const**, 0, false>> = {
      __value_ = 0x0
    }, 
    <std::__1::__compressed_pair_elem<std::__1::allocator<DB::ActionsDAG::Node const*>, 1, true>> = {
      <std::__1::allocator<DB::ActionsDAG::Node const*>> = {
        <std::__1::__non_trivial_if<true, std::__1::allocator<DB::ActionsDAG::Node const*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}
}

I.e. they both got the 8-byte value, lldb refused to extend it to 24 bytes, and gdb zero-padded it to 24 bytes. With this workaround I'm getting the correct value (valid std::vector with expected contents).

philipc commented 4 months ago

Thanks. I think that confirms it is an LLVM bug.

I found this in the LLVM tests:

# This becomes a problem when values move onto the stack and we emit
# DW_OP_deref: there is no information about how large a value the consumer
# should load from the stack. The convention today appears to be the size of
# the variable, ...

which is at odds with what the DWARF V5 spec says in 2.5.1.3:

The DW_OP_deref operation pops the top stack entry and treats it as an
address. The popped value must have an integral type. The value retrieved
from that address is pushed, and has the generic type. The size of the data
retrieved from the dereferenced address is the size of an address on the target
machine.

https://github.com/llvm/llvm-project/issues/64093 is a similar problem where the deref value is large, but for DW_OP_deref_size.

philipc commented 4 months ago

It seems like LLVM is using DW_OP_stack_value for variables that have simply been spilled to the stack (that's the program stack, not the DWARF stack), which makes no sense to me: if it is spilled to the stack then that's its new location, no need to treat it as an implicit location.

al13n321 commented 4 months ago

Good find. Sounds like this doesn't belong in gimli then. Or maybe it could be something on the side, e.g. gimli::quirks::preprocess_expression(Expression) -> Expression.

Moved the workaround into my code instead (which already accumulated ~7 similar workarounds for debug info quirks, not sure even why I tried adding this particular one to gimli instead, sorry for taking your time; but I appreciate the confirmation that the problem is in LLVM!).