davidlattimore / wild

Apache License 2.0
471 stars 12 forks source link

Add support for DWARF debug info #37

Closed davidlattimore closed 2 days ago

davidlattimore commented 4 weeks ago

I'm not 100% sure what's involved here. I gather that eh_frame info, which we already support, is somewhat related to, or a more limited form of dwarf debug info.

marxin commented 2 weeks ago

I would like to try implementing this feature ;)

Debug info is made of a collection of sections that are interlinked and concatenated: Screenshot from 2024-08-26 06-46-18

The format heavily depends on relocations:

$ gcc ~/Programming/main.c -g -c
$ readelf -r main.o

Relocation section '.rela.debug_info' at offset 0x3c0 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000008  00040000000a R_X86_64_32       0000000000000000 .debug_abbrev + 0
00000000000d  00060000000a R_X86_64_32       0000000000000000 .debug_str + 5
000000000012  00070000000a R_X86_64_32       0000000000000000 .debug_line_str + 5
000000000016  00070000000a R_X86_64_32       0000000000000000 .debug_line_str + 0
00000000001a  000200000001 R_X86_64_64       0000000000000000 .text + 0
00000000002a  00050000000a R_X86_64_32       0000000000000000 .debug_line + 0
00000000002f  00060000000a R_X86_64_32       0000000000000000 .debug_str + 0
00000000003a  000200000001 R_X86_64_64       0000000000000000 .text + 0

Relocation section '.rela.debug_aranges' at offset 0x480 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000006  00030000000a R_X86_64_32       0000000000000000 .debug_info + 0
000000000010  000200000001 R_X86_64_64       0000000000000000 .text + 0

Relocation section '.rela.debug_line' at offset 0x4b0 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000022  00070000000a R_X86_64_32       0000000000000000 .debug_line_str + 25
000000000026  00070000000a R_X86_64_32       0000000000000000 .debug_line_str + 2a
000000000030  00070000000a R_X86_64_32       0000000000000000 .debug_line_str + 43
000000000035  00070000000a R_X86_64_32       0000000000000000 .debug_line_str + 4a
00000000003f  000200000001 R_X86_64_64       0000000000000000 .text + 0

Relocation section '.rela.eh_frame' at offset 0x528 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0

The implementation should include the following steps:

  1. add support for compressed sections (can be skipped for now with objcopy --decompress-debug-sections)
  2. non-alloc sections should not be assigned with addresses - I'm going to file a separate issue for that
  3. GC of the sections should not be inhibited by a debug-info section and the corresponding relocation of a debug-info section should use a tombstone value (0 or 1 value depending on the section name). Can you help me how to integrate that into the GC algorithm?
  4. Should we list all the supported debug-info section names (~20 sections for DWARF 4 and 5) (OutputSectionId::regular)? Or do we want to use a different mechanism?
davidlattimore commented 2 weeks ago

I would like to try implementing this feature ;)

Great! Thanks!

3. GC of the sections should not be inhibited by a debug-info section and the corresponding relocation of a debug-info section should use a tombstone value (0 or 1 value depending on the section name). Can you help me how to integrate that into the GC algorithm?

Do you know if we need to split the sections then GC the parts of the sections that are for functions that get GCed? I had to do that with the .eh_frame support and it certainly added some complexity.

If you don't need to split the sections up and you also don't want relocations in those sections to prevent other things from being GCed, then you can possibly skip reading the relocations for those sections during the layout phase. e.g Section::create could skip the bit where it iterates over relocations. That also assumes that the relocations don't need any allocations - i.e. that none of the relocations need to be turned into runtime relocations.

4. Should we list all the supported debug-info section names (~20 sections for DWARF 4 and 5) (OutputSectionId::regular)? Or do we want to use a different mechanism?

If you need to refer to the sections from code - e.g. if you need special logic for a particular section, then yes. Making the sections "regular" sections means that they'll be split by alignment. I guess if a particular section needs to only ever have one particular alignment, then we could use a "generated" section - although it'd make the name "generated" slightly less appropriate.

If you don't need special logic for debug sections, or if the only special logic is that you want to not process the relocations during the layout (GC) phase, then you could add a field on SectionDetails - like is_debug_info.

marxin commented 2 weeks ago

Do you know if we need to split the sections then GC the parts of the sections that are for functions that get GCed? I had to do that with the .eh_frame support and it certainly added some complexity.

No, for debug info sections, we'll only need to make the relocation resolution (with exception of .debug_str and .debug_line_str which can be candidate for string merging).

That also assumes that the relocations don't need any allocations - i.e. that none of the relocations need to be turned into runtime relocations.

Yep, that's how I understand it as debug info is never read by a dynamic linker. It's the consumer like gdb or valgrind which loads the debug info from a (potentially different) file.

If you don't need special logic for debug sections, or if the only special logic is that you want to not process the relocations during the layout (GC) phase, then you could add a field on SectionDetails - like is_debug_info.

Ok, I'm going to start with the suggested approach. Thanks.

marxin commented 2 days ago

Please close this as implemented.