dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.37k stars 4.75k forks source link

DWARF errors with `nm` on linux-musl #98377

Closed am11 closed 9 months ago

am11 commented 9 months ago

One linux-musl-arm64, using dotnet9:

$ dotnet9 --version
9.0.100-preview.2.24113.6

$ dotnet9 new console -n nine
$ cd nine
$ dotnet9 publish -o dist -c Release -p:PublishAot=true -p:StripSymbols=true

$ nm --version | head -1
GNU nm (GNU Binutils) 2.40

$ nm --portability --line-numbers dist/nine  > /dev/null
nm: DWARF error: offset (2468611840) greater than or equal to .debug_str size (949493)
nm: DWARF error: offset (93531399) greater than or equal to .debug_str size (949493)
nm: DWARF error: offset (1248010) greater than or equal to .debug_str size (949493)
nm: DWARF error: could not find abbrev number 85028

Notes:

ps - found this while analyzing size difference in linux-arm64 (1.3M) and linux-musl-arm64 (1.6M) naot binaries using this python package https://github.com/jedrzejboczar/elf-size-analyze (which makes use of nm among other things)

cc @filipnavara

ghost commented 9 months ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
One linux-musl-arm64, using dotnet9: ```sh $ dotnet9 --version 9.0.100-preview.2.24113.6 $ dotnet9 new console -n nine $ cd nine $ dotnet9 publish -o dist -c Release -p:PublishAot=true -p:StripSymbols=true $ nm --version | head -1 GNU nm (GNU Binutils) 2.40 $ nm --portability --line-numbers dist/nine > /dev/null nm: DWARF error: offset (2468611840) greater than or equal to .debug_str size (949493) nm: DWARF error: offset (93531399) greater than or equal to .debug_str size (949493) nm: DWARF error: offset (1248010) greater than or equal to .debug_str size (949493) nm: DWARF error: could not find abbrev number 85028 ``` Notes: * No errors with .NET 8 * No errors with llvm-nm (`llvm-nm --portability --print-file-name dist/nine`) * On ubuntu/glibc/gnu-nm arm64, both .NET 8 and 9 produce: ```sh nm: DWARF error: invalid or unhandled FORM value: 0x23 nm: DWARF error: invalid or unhandled FORM value: 0x22 ``` ps - found this while analyzing size difference in linux-arm64 (1.3M) and linux-musl-arm64 (1.6M) naot binaries using this python package https://github.com/jedrzejboczar/elf-size-analyze (which makes use of `nm` among other things) cc @filipnavara
Author: am11
Assignees: -
Labels: `untriaged`, `area-NativeAOT-coreclr`
Milestone: -
filipnavara commented 9 months ago

Can you attach the produced binaries please?

nm: DWARF error: could not find abbrev number 85028

We never produce abbreviations with such high numbers, they are all < 100.

nm: DWARF error: invalid or unhandled FORM value: 0x23 nm: DWARF error: invalid or unhandled FORM value: 0x22

I have seen that from some tools which were cross-arch. Likewise, we do not generate these forms (DW_FORM_loclistx / DW_FORM_rnglistx). They are DWARF5 which is produced by the native compilers.

am11 commented 9 months ago

Can you attach the produced binaries please?

linux-musl-arm64_dotnet8_dotnet9.zip linux-arm64_dotnet8_dotnet9.zip

am11 commented 9 months ago

With DOTNET_USE_LLVM_OBJWRITER=1 and dotnet9 on linux-musl, gnu nm doesn't report any error either:

linux-musl-arm64_dotnet9llvm.zip

filipnavara commented 9 months ago

Thanks for the files. I'll have a look in the coming days.

filipnavara commented 9 months ago

There's definitely garbage in the DWARF information when dumped with llvm-dwarfdump:

0x0000095b:     DW_TAG_subprogram
                  DW_AT_name    ("get_ValueTypeFieldPadding")
                  DW_AT_linkage_name    ("Internal.Runtime.MethodTable__get_ValueTypeFieldPadding")
                  DW_AT_decl_file       ("/_/src/libraries/Common/src/Interop/Unix/Interop.IOErrors.cs")
                  DW_AT_decl_line       (1)
                  DW_AT_type    (0x0000095e)
                  DW_AT_external        (true)
                  DW_AT_declaration     (true)
                  DW_AT_object_pointer  (0x00000aeb)

0x0000096e:       DW_TAG_formal_parameter
                    DW_AT_type  (0x00021fdd)
                    DW_AT_artificial    (true)

0x00000973:       NULL

...

0x00000a85:     DW_TAG_subprogram
                  DW_AT_name    (".ctor")
                  DW_AT_linkage_name    ("Object___ctor")
                  DW_AT_decl_file       ("/_/src/libraries/Common/src/Interop/Unix/Interop.IOErrors.cs")
                  DW_AT_decl_line       (1)
                  DW_AT_type    (0x0000096a)
                  DW_AT_external        (true)
                  DW_AT_declaration     (true)
                  DW_AT_object_pointer  (0x00000c15)

Notice that DW_AT_type (0x0000096a) points into middle of another record.

Do you have the .o file from ILC for linux-musl-arm64_dotnet8_dotnet9/nine by any chance? Either the relocations are wrong or the linker processed them incorrectly. Notably, old ObjWriter was not producing relocations for references within the .debug_info section. It is technically wrong but maybe the relocations trigger some linker bug.

filipnavara commented 9 months ago

The incorrect DW_AT_type (0x0000096a) offset should have been pointing to

0x000007ed:   DW_TAG_unspecified_type
                DW_AT_name      ("void")

which happens to be off by 0x17D and that's exactly where the IL.c compile unit starts:

0x0000017d: Compile Unit: length = 0x0005279c, format = DWARF32, version = 0x0004, abbr_offset = 0x00fc, addr_size = 0x08 (next unit at 0x0005291d)
am11 commented 9 months ago

with .o files linux-musl-arm64_dotnet8_dotnet9-with.o.zip linux-musl-arm64_dotnet9llvm-with.o.zip

filipnavara commented 9 months ago

The object file still has the info correctly:

0x00000670:   DW_TAG_unspecified_type
                DW_AT_name      ("void")

0x00000908:     DW_TAG_subprogram
                  DW_AT_name    (".ctor")
                  DW_AT_linkage_name    ("Object___ctor")
                  DW_AT_decl_file       ("/_/src/libraries/Common/src/Interop/Unix/Interop.IOErrors.cs")
                  DW_AT_decl_line       (1)
                  DW_AT_type    (0x00000670 "void")
                  DW_AT_external        (true)
                  DW_AT_declaration     (true)
                  DW_AT_object_pointer  (0x0000091b)

with a relocation:

0000000000000913  0000000c00000102 R_AARCH64_ABS32        0000000000000000 .debug_info + 670

However, it seems that neither clang, nor gcc, generate these .debug_info relocations. I'll try to make a patch to see if it makes any difference.