dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.85k stars 4.62k forks source link

[NativeAOT] xcode15+ linker issue #97745

Open VSadov opened 7 months ago

VSadov commented 7 months ago

There is a chance that this is something with my machine, but it looks like I cannot build run tests with 8.0 It could be related to xcode 15+

I have 15.2 and tried downgrade to 15.0 with the same result. It does not seem possible to downgrade further as older versions claim to be incompatible with the new OS.

I am using the following command:

/build.sh clr.alljits+clr.tools+clr.nativeaotlibs+clr.nativeaotruntime+libs -rc Release -lc Release ; src/tests/build.sh nativeaot Release 'tree nativeaot' ; src/tests/run.sh --runnativeaottests Release

That works fine on main branch (product and tests build and tests pass). But on release/8.0 branch I get a bunch of errors like:

/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/build/Microsoft.NETCore.Native.targets(308,5): error MSB3073: The command ""/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/ilc-published/ilc" @"/Users/vs/Hosting01/runtime/artifacts/tests/coreclr/obj/osx.arm64.Release/Managed/nativeaot/SmokeTests/StackTraceMetadata/StackTraceMetadata_Stripped/native/StackTraceMetadata_Stripped.ilc.rsp"" exited with code 139.

ghost commented 7 months ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
There is a chance that this is something with my machine, but it looks like I cannot build run tests with 8.0 It could be related to xcode 15+ I have 15.2 and tried downgrade to 15.0 with the same result. It does not seem possible to downgrade further as older versions claim to be incompatible with the new OS. I am using the following command: `/build.sh clr.alljits+clr.tools+clr.nativeaotlibs+clr.nativeaotruntime+libs -rc Release -lc Release ; src/tests/build.sh nativeaot Release 'tree nativeaot' ; src/tests/run.sh --runnativeaottests Release` That works fine on `main` branch (product and tests build and tests pass). But on `release/8.0` branch I get a bunch of errors like: /Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/build/Microsoft.NETCore.Native.targets(308,5): error MSB3073: The command ""/Users/vs/Hosting01/runtime/artifacts/bin/coreclr/osx.arm64.Release/ilc-published/ilc" @"/Users/vs/Hosting01/runtime/artifacts/tests/coreclr/obj/osx.arm64.Release/Managed/nativeaot/SmokeTests/StackTraceMetadata/StackTraceMetadata_Stripped/native/StackTraceMetadata_Stripped.ilc.rsp"" exited with code 139.
Author: VSadov
Assignees: -
Labels: `untriaged`, `area-NativeAOT-coreclr`
Milestone: -
VSadov commented 7 months ago

@filipnavara - could this be another linker issue similar to https://github.com/dotnet/runtime/pull/92520? Or maybe even the same issue, just showing up in a different way?

VSadov commented 7 months ago

I think the lab is still on xcode 14 and that could be the reason this does not happen in CI

MichalStrehovsky commented 7 months ago
The command "".../ilc" @"..,/StackTraceMetadata_Stripped.ilc.rsp"" exited with code 139.

This is ILC crashing. One option to troubleshoot would be just to re-run this line under a debugger.

VSadov commented 6 months ago

Passing -ld64 to ILC build and to the Microsoft.NETCore.Native.Unix.targets makes the problem go away - the ILC is functional can build tests and tests pass*

This does seem to be some kind of incompatibility with the new linker.

filipnavara commented 6 months ago

could this be another linker issue similar to #92520?

Generally speaking, yes. I won’t have time to look into it until next week, unfortunately.

VSadov commented 6 months ago

I think nothing is blocked on this yet, so this is mostly a heads up. Since the lab uses xcode14, it is not affected. Also I am unsure if xcode15+ is a supported end user configuration for 8.0. Since the lab is still on 14, probably not yet.

jkotas commented 6 months ago

I am unsure if xcode15+ is a supported end user configuration for 8.0.

Yes, it is supported. Prerequisites in our documentation do not ask for a specific old XCode version.

VSadov commented 6 months ago

Yes, it is supported. Prerequisites in our documentation do not ask for a specific old XCode version.

That is desirable, but practically it is hard to guarantee support for something newer than what the lab runs. That said, moving to new xcode has been mostly uneventful in the past. It is just v15 brings a new linker with a bunch of incompatibilities. I would not rule out that some of those incompatibilities are unintentional and basically bugs that may eventually be fixed.

Maybe the right course of action for 8.0 is to use -ld_classic until ld_prime becomes more stable?

jkotas commented 6 months ago

it is hard to guarantee support for something newer than what the lab runs

Yes, we have same problem with support of new OS versions. We take fixes for these types of issues in servicing.

filipnavara commented 6 months ago

Shouldn't the milestone be set to 8.0.x instead of 9.0.0 since this apparently doesn't happen on main?

filipnavara commented 6 months ago

Nevermind, there is definitely something broken even on main with the new linker. I updated to Xcode 15.2 and I get crash in nativeaot/SmokeTests/FrameworkStrings/Baseline test at the end. Relinking with -ld_classic and no other changes makes it go away. So does linking against Debug build of libRuntime.WorkstationGC.a. I'll investigate next week.

I did some debug build earlier and I started seeing this, which may be related:

        --------------------------------------------------
        Debug Assertion Violation

        Expression: 'm_pInstance->IsManaged(m_ControlPC) && "unwind from throw site stub failed"'

        File: /Users/filipnavara/Projects/runtime/src/coreclr/nativeaot/Runtime/StackFrameIterator.cpp, Line: 1289
        --------------------------------------------------
filipnavara commented 6 months ago

The issue I get in the Baseline test is trashed SP after a caught exception (before: 0x000000016fdff1a0; after: 0x00000001814790e0). The unwinding information may be corrupted.

filipnavara commented 6 months ago

So far I traced it to UnwindHelpers::GetUnwindProcInfo returning bogus info belonging to a different method than asked for. The unwinding data itself in the executable is likely fine. I did a spot check on it, and lldb unwinds the same stack trace just fine.

filipnavara commented 6 months ago

So, the linker indeed produces garbage unwind tables (as verified by objdump -u):

    Second level index[6]: offset in section=0x0000a3f8, base function offset=0x000f55f0
      [0]: function offset=0x000f55f0, encoding=0x44000000
      [1]: function offset=0x000f5640, encoding=0x03011500
      [2]: function offset=0x000f56f0, encoding=0x03011528
      [3]: function offset=0x000f5740, encoding=0x0301154c
...
      [106]: function offset=0x000f92d0, encoding=0x03011e00
      [107]: function offset=0x000f9320, encoding=0x03011e24
      [108]: function offset=0x0018a5c8, encoding=0x00000000
      [109]: function offset=0x000f93e0, encoding=0x03011e4c
      [110]: function offset=0x0018a608, encoding=0x00000000
      [111]: function offset=0x000f9440, encoding=0x03011e70
      [112]: function offset=0x0018a668, encoding=0x00000000
      [113]: function offset=0x000f94c0, encoding=0x03011e94
      [114]: function offset=0x0018a6e8, encoding=0x00000000
      [115]: function offset=0x000f9580, encoding=0x03011eb8
      [116]: function offset=0x0018a7a8, encoding=0x00000000
      [117]: function offset=0x000f9620, encoding=0x03011ee0
      [118]: function offset=0x0018a848, encoding=0x00000000
      [119]: function offset=0x000f96b0, encoding=0x44000000
      [120]: function offset=0x0018a8d8, encoding=0x00000000
      [121]: function offset=0x000f96e0, encoding=0x44000000
      [122]: function offset=0x0018a908, encoding=0x00000000
      [123]: function offset=0x000f9710, encoding=0x44000000
      [124]: function offset=0x0018a938, encoding=0x00000000
      [125]: function offset=0x000f9740, encoding=0x44000000
      [126]: function offset=0x0018a968, encoding=0x00000000
      [127]: function offset=0x000f9790, encoding=0x44000000
      [128]: function offset=0x0018a9b8, encoding=0x00000000
      [129]: function offset=0x000f97b0, encoding=0x03011f04
      [130]: function offset=0x0018a9d8, encoding=0x00000000
      [131]: function offset=0x000f9960, encoding=0x03011f30
      [132]: function offset=0x0018ab88, encoding=0x00000000
…

The function offsets are supposed to be sorted. I'll collect the build artifacts and submit a report to Apple.

filipnavara commented 6 months ago

Feedback sent to Apple (FB13584275). For reference, here's the repro with files and a build.sh script: ld64-repro.zip

Broken linker will produce unsorted/corrupted unwind tables, which can be verified with objdump -u Baseline.

filipnavara commented 6 months ago

The bogus unwinding information comes from functions in the __unbox section. We currently don't generate any unwind information for that so presumably the linker generates its own "dummy" one and fails to sort it properly. I'll investigate whether generating it ourselves makes a difference.

Update: It does not. :/

filipnavara commented 6 months ago

Response from Apple:

Thank you for filing the feedback report. This is a bug in our linker.

You can use the following workaround until a fix become available: Please use the -Wl,-Id_classic linker option

ivanpovazan commented 2 months ago

Regarding:

Maybe the right course of action for 8.0 is to use -ld_classic until ld_prime becomes more stable?

Apple announced that they are removing this linker option in https://developer.apple.com/documentation/xcode-release-notes/xcode-16-release-notes so this problem becomes more pressing.

filipnavara commented 2 months ago

Apple updated my Feedback tickets with request to try Xcode 16 and report back.

filipnavara commented 2 months ago

The original problem with corrupted unwind tables seems to be fixed [with Xcode 16 Beta 1] on the few small samples that I tried. We need to do a full run to make sure that there are no other issues.