dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.37k stars 4.75k forks source link

Native AOT DWARF info contains many overlapping ranges #83233

Closed agocke closed 1 year ago

agocke commented 1 year ago

dwarfdump --verify shows examples of this:

error: DIE address ranges are not contained in its parent's ranges: 0x0000000b: DW_TAG_compile_unit [1] * DW_AT_producer [DW_FORM_string] ("CoreRT") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_string] ("IL.c") DW_AT_comp_dir [DW_FORM_string] ("/tmp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000000) DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000)

0x0006cece: DW_TAG_subprogram [11] * (0x0000000b) DW_AT_specification [DW_FORM_ref4] (cu + 0x518ab => {0x000518ab} "System_IO_Pipes_Interop__GetExceptionForIoErrno") DW_AT_low_pc [DW_FORM_addr] (0x0000000000072500) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000569) DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6) ranges: (occurs 1683 times)

This seems like it's related to inlining. If so, I would expect DW_TAG_inlined_subroutine entries, but I'm not seeing anything from the JITed code, only from the linked C++.

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.

Issue Details
`dwarfdump --verify` shows examples of this: > error: DIE address ranges are not contained in its parent's ranges: 0x0000000b: DW_TAG_compile_unit [1] * DW_AT_producer [DW_FORM_string] ("CoreRT") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_string] ("IL.c") DW_AT_comp_dir [DW_FORM_string] ("/tmp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000000) DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000) > 0x0006cece: DW_TAG_subprogram [11] * (0x0000000b) DW_AT_specification [DW_FORM_ref4] (cu + 0x518ab => {0x000518ab} "System_IO_Pipes_Interop__GetExceptionForIoErrno") DW_AT_low_pc [DW_FORM_addr] (0x0000000000072500) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000569) DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6) ranges: (occurs 1683 times) This seems like it's related to inlining. If so, I would expect `DW_TAG_inlined_subroutine` entries, but I'm not seeing anything from the JITed code, only from the linked C++.
Author: agocke
Assignees: -
Labels: `area-CodeGen-coreclr`
Milestone: -
jakobbotsch commented 1 year ago

I'm struggling to read the output here -- what is the error saying? What are the source mappings returned from the JIT that resulted in these DWARF ranges (or is this from variable ranges?)

The JIT does not support reporting debug information in inlinees unless rich debug information is enabled (DOTNET_RichDebugInfo=1). However, even without rich debug information things are normalized back to the root method.

jakobbotsch commented 1 year ago

Maybe ILC is emitting an incorrect DW_AT_high_pc (which is 0) for the DW_TAG_compile_unit? Is the error saying that it expected the DW_TAG_subprogram to be contained in the compile unit?

agocke commented 1 year ago

Sorry, that was a bad exmaple, maybe this one is better:

error: DIEs have overlapping address ranges:
0x0006cff1: DW_TAG_lexical_block [35] * (0x0006cfae)
              DW_AT_low_pc [DW_FORM_addr]       (0x0000000000072bf5)
              DW_AT_high_pc [DW_FORM_data8]     (0x0000000000000007)

0x0006cfd2: DW_TAG_lexical_block [35] * (0x0006cfae)
              DW_AT_low_pc [DW_FORM_addr]       (0x0000000000072bbc)
              DW_AT_high_pc [DW_FORM_data8]     (0x000000000000003c)
agocke commented 1 year ago

The nesting is a different issue.

agocke commented 1 year ago

If you use the dwarfdump binary from the native AOT smoke tests, you should be able to get an actual binary.

agocke commented 1 year ago

The emit for these things is at https://github.com/dotnet/llvm-project/blob/eb39d072da0ffac93bf087e4d8196b7133ee3a0f/llvm/tools/objwriter/debugInfo/dwarf/dwarfGen.cpp#L564

Lexical scopes are created https://github.com/dotnet/llvm-project/blob/eb39d072da0ffac93bf087e4d8196b7133ee3a0f/llvm/tools/objwriter/debugInfo/dwarf/dwarfGen.cpp#L858

And an example of where variable infos are from https://github.com/dotnet/llvm-project/blob/eb39d072da0ffac93bf087e4d8196b7133ee3a0f/llvm/tools/objwriter/debugInfo/dwarf/dwarfGen.cpp#L697

And that all maps back to https://github.com/dotnet/llvm-project/blob/eb39d072da0ffac93bf087e4d8196b7133ee3a0f/llvm/tools/objwriter/cordebuginfo.h#L329

jakobbotsch commented 1 year ago

Gotcha, thank you, I'll try to take a closer look. It sounds feasible that the JIT is emitting an incorrect variable range here.

jakobbotsch commented 1 year ago

Looking at the following error:

error: DIEs have overlapping address ranges:
0x0006a2a3: DW_TAG_lexical_block [35] * (0x0006a260)
              DW_AT_low_pc [DW_FORM_addr]   (0x0000000000072e05)
              DW_AT_high_pc [DW_FORM_data8] (0x0000000000000007)

0x0006a284: DW_TAG_lexical_block [35] * (0x0006a260)
              DW_AT_low_pc [DW_FORM_addr]   (0x0000000000072dcc)
              DW_AT_high_pc [DW_FORM_data8] (0x000000000000003c)

image

Those lexical scopes are clearly not nested correctly in the DWARF info.

The function is System.IO.Pipes.Interop+Sys:StrError. The variable ranges returned by the JIT are:

VarLocInfo count is 4
; Variable debug info: 3 live ranges, 3 vars for method Interop+Sys:StrError(int):System.String
(V00 arg0) : From 00000000h to 0000002Ch, in rdi
(V01 loc0) : From 0000002Ch to 00000068h, in rbx
(V02 loc1) : From 00000065h to 0000006Ch, in rdi

which seems quite reasonable. So probably the error is in the translation of these ranges into DWARF format. There is no expectation that live var ranges are guaranteed to be lexically nested, so if DwarfGen is creating lexical scopes for each live var ranges then it makes sense that it causes errors. Will try to look into why these lexical scopes are created and the correct way to represent these live ranges in DWARF.

jakobbotsch commented 1 year ago

So I guess those inner scopes are created here: https://github.com/dotnet/llvm-project/blob/eb39d072da0ffac93bf087e4d8196b7133ee3a0f/llvm/tools/objwriter/debugInfo/dwarf/dwarfGen.cpp#L531-L547

I would expect we shouldn't be creating these scopes at all and relying on the code that adds the variables to .debug_loc that @agocke linked above: https://github.com/dotnet/llvm-project/blob/eb39d072da0ffac93bf087e4d8196b7133ee3a0f/llvm/tools/objwriter/debugInfo/dwarf/dwarfGen.cpp#L697

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
`dwarfdump --verify` shows examples of this: > error: DIE address ranges are not contained in its parent's ranges: 0x0000000b: DW_TAG_compile_unit [1] * DW_AT_producer [DW_FORM_string] ("CoreRT") DW_AT_language [DW_FORM_data2] (DW_LANG_C_plus_plus) DW_AT_name [DW_FORM_string] ("IL.c") DW_AT_comp_dir [DW_FORM_string] ("/tmp") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000000) DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000) > 0x0006cece: DW_TAG_subprogram [11] * (0x0000000b) DW_AT_specification [DW_FORM_ref4] (cu + 0x518ab => {0x000518ab} "System_IO_Pipes_Interop__GetExceptionForIoErrno") DW_AT_low_pc [DW_FORM_addr] (0x0000000000072500) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000569) DW_AT_frame_base [DW_FORM_exprloc] (DW_OP_reg6) ranges: (occurs 1683 times) This seems like it's related to inlining. If so, I would expect `DW_TAG_inlined_subroutine` entries, but I'm not seeing anything from the JITed code, only from the linked C++.
Author: agocke
Assignees: agocke
Labels: `area-CodeGen-coreclr`, `area-NativeAOT-coreclr`
Milestone: 8.0.0
MichalStrehovsky commented 1 year ago

I fixed most of the overlapping range issues in #89488. It fell into a JitInterface trap where on JitInterface we have fields named "Length" that are not actually populated as lengths in RyuJIT (#5282).

I believe the only remaining issue is with overlapping regions that are related to CORINFO_EH_CLAUSE_SAMETRY. Not sure how to represent that with DWARF.

agocke commented 1 year ago

Yeah, the vast majority of the issues discussed here have been fixed. I filed a new bug for the last few issues that are left. https://github.com/dotnet/runtime/issues/90209