dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.37k stars 4.75k forks source link

.NET NativeAOT x64 builds on macOS adds dependency on @rpath/libswiftCore.dylib instead of correct install path #79454

Closed christianscheuer closed 1 year ago

christianscheuer commented 1 year ago

When building an app with NativeAOT via latest bits, the x64 and arm64 builds link differently to libswiftCore.dylib and libswiftFoundation.dylib, causing x64 builds to not load.

Selected output from otool -L mybinaryname – for the x64 build it looks for libswiftCore and libswiftFoundation in the @rpath:

@rpath/libswiftCore.dylib (compatibility version 1.0.0, current version 5.7.1)
@rpath/libswiftFoundation.dylib (compatibility version 1.0.0, current version 120.100.0)

This only works if you manually copy/paste these dylibs into your Frameworks folder of the macOS bundle you're producing, or manually add /usr/lib/swift to the @rpath somehow.

The arm64 builds correctly reference their install locations:

/usr/lib/swift/libswiftCore.dylib (compatibility version 1.0.0, current version 5.7.1)
/usr/lib/swift/libswiftFoundation.dylib (compatibility version 1.0.0, current version 120.100.0)

Workaround:

It would be even more ideal if there was no dependency on Swift dynamic libraries at all - not sure why this got added recently? I know that Xcode has an option to "Always Embed Swift Standard Libraries" which is when the @rpath switch would make sense (if you're building a macOS bundle where the Swift dylibs would then get copied into the Frameworks folder). But given that most people would not be making bundles with .NET but rather console apps, it's unfortunate if there's a dependency on these dylibs if it means embedding them is the only good option to have forwards/backwards compatibility with whatever Apple installs in the global location. I'm assuming there's a reason for the embed option that it's because it doesn't always work to link against the system-installed ones.

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

christianscheuer commented 1 year ago

cc @jkotas

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
When building an app with NativeAOT via latest bits, the x64 and arm64 builds link differently to `libswiftCore.dylib` and `libswiftFoundation.dylib`, causing x64 builds to not load. Selected output from `otool -L mybinaryname` – for the x64 build it looks for `libswiftCore` and `libswiftFoundation` in the `@rpath`: ``` @rpath/libswiftCore.dylib (compatibility version 1.0.0, current version 5.7.1) @rpath/libswiftFoundation.dylib (compatibility version 1.0.0, current version 120.100.0) ``` This only works if you manually copy/paste these dylibs into your Frameworks folder of the macOS bundle you're producing, or manually add `/usr/lib/swift` to the `@rpath` somehow. The arm64 builds correctly reference their install locations: ``` /usr/lib/swift/libswiftCore.dylib (compatibility version 1.0.0, current version 5.7.1) /usr/lib/swift/libswiftFoundation.dylib (compatibility version 1.0.0, current version 120.100.0) ``` Workaround: - We can manually run `install_name_tool` and overwrite the reference for now, but it seems like this should be fixed so the x64 builds also produce absolute references. It would be even more ideal if there was no dependency on Swift dynamic libraries at all - not sure why this got added recently? I know that Xcode has an option to "Always Embed Swift Standard Libraries" which is when the `@rpath` switch would make sense (if you're building a macOS bundle where the Swift dylibs would then get copied into the Frameworks folder). But given that most people would not be making bundles with .NET but rather console apps, it's unfortunate if there's a dependency on these dylibs if it means embedding them is the only good option to have forwards/backwards compatibility with whatever Apple installs in the global location. I'm assuming there's a reason for the embed option that it's because it doesn't always work to link against the system-installed ones.
Author: christianscheuer
Assignees: -
Labels: `untriaged`, `area-NativeAOT-coreclr`
Milestone: -
am11 commented 1 year ago

cc @vcsjones

christianscheuer commented 1 year ago

According to this answer: https://developer.apple.com/forums/thread/697688

If your deployment target is 10.14.4 or later, the Swift runtime is built in to macOS and so you’ll never need to include a copy in your product.

If your deployment target is earlier than that, you’ll need to include a copy of the Swift runtime in your product. This will be used if your code runs on an early system.

Since we try to target older macOS versions we would need to embed these dylibs. I reckon for most people using NativeAOT officially would target 10.15+ so they'd probably prefer that the references use absolute paths to /usr/lib/swift/.... Since we want to target 10.12 we would probably embed the frameworks and manually overwrite to pick this up from our Frameworks dir.

So, in short: what I proposed originally, that the x64 builds should behave as the arm64 ones (linking to the system installed /usr/lib/swift/...), is what I'd recommend.

am11 commented 1 year ago

not sure why this got added recently?

It was introduced in https://github.com/dotnet/runtime/pull/76317. Apple doesn't provide C implementation for some crypto primitives. Perhaps we can add an option to exclude swift dependency (similar to LinkStandardCPlusPlusLibrary in src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Unix.targets) if it is viable?

christianscheuer commented 1 year ago

It would be awesome to not have to link to them at all for sure!

Edit: Or if possible, it could be made a dynamic/passive/lazy dependency? Not sure what the right terminology is but the kind of reference that only needs to be resolved if that part of the code gets run. I seem to recall Jan talking sometime in the past about DllImports being able to opt in/out of static vs. dynamic loading.

jkotas commented 1 year ago

The prebuilt libraries in the nuget packages target 10.15+ for .NET 8: https://github.com/dotnet/runtime/blob/main/eng/native/configurecompiler.cmake#L512-L518

Targeting lower version using the prebuilt libraries is not going to work well. It is likely that the prebuilt libraries are going to take dependency on 10.15+ in more places.

I think the best option may be to build your own native libraries if you would like to target lower version than what's officially supported.

jkotas commented 1 year ago

Perhaps we can add an option to exclude swift dependency

I think it would require building and shipping a version of the native libraries for pre-10.15 and include those in the nuget package. Or is there some other way to exclude the swift dependency?

christianscheuer commented 1 year ago

Thanks @jkotas. I know targeting pre-10.15 is sort of our own trouble. I should probably start a separate thread asking for advice on how we would go about building these ourselves. That being said, we've been running in production using the .NET7 ILC alphas which also had libraries targeting 10.15+ on OS'es as old as 10.12 without any issues, so while it is not guaranteed to work, we've managed to make it so. Building the libs ourselves would definitely be a step up in ensuring we can keep doing it though.

The issue raised in this thread about the @rpath linking is still valid though on 10.15+ builds as the problem comes from /usr/lib/swift not being in the @rpath so x64 builds fail to load.

Edit: I used install_name_tool manually as a post-build step to change from @rpath/... to /usr/lib/swift/... and the x64 builds now load correctly.

am11 commented 1 year ago

Or is there some other way to exclude the swift dependency?

Two options come to mind, which we are currently using in different components:

filipnavara commented 1 year ago

TL;DR: The @rpath is added by the linker if targeting older macOS version. The linker seems to use correct system path if the minimum version is 10.15+.

filipnavara commented 1 year ago

@am11 Using dlopen/dlsym for Swift is basically no-go. Writing Swift in C directly against the poorly documented runtime API is not an option :-)

The only viable approach is marking the Swift libraries as weak references. You can keep using the SwiftC compiler and the linker will then load them only if they exist. You just have to add a layer of guards against calling the function on a downlevel OS. These guards can be in native or managed code, either way works.

filipnavara commented 1 year ago

This is the bit that can make it working on downlevel OSes ( https://github.com/filipnavara/runtime/commit/628c8908420e07ed2bd0dd2596be1a89f7a09569#diff-828eeb39399c8691f9401ceb7e3c774470ef67be60b5ab165b9c45e2408c7677):

    # Overrides for weak linking to Swift core libraries. The Swift libraries ship
    # with OS only from macOS 10.4.4 and iOS 12.2. Additionally when targeting
    # down-level platforms the toolchain libraries specify "@rpath" based paths
    # to facilitate fallback to locally shipped Swift runtime. We don't ship
    # the runtime and just point to the system one.
    list(APPEND ${NativeLibsExtra} -L/usr/lib/swift -lobjc -weak-lswiftCore -weak-lswiftFoundation -Wl,-rpath,/usr/lib/swift)

You would need to adapt it for NativeAOT and add it to LinkerArg compile item.

filipnavara commented 1 year ago

The issue raised in this thread about the @rpath linking is still valid though on 10.15+ builds as the problem comes from /usr/lib/swift not being in the @rpath so x64 builds fail to load.

It is not, really. When targeting macOS 10.15+ the linker doesn't use @rpath-based paths for the Swift libraries. (Actually, not the linker; the SDK has the changed linker scripts.)

vcsjones commented 1 year ago

targeting pre-10.15 is sort of our own trouble

If all of this is because of trying to target down level macOS, I'm not sure how well that is going to work long term. We have removed other compatibility things for 10.14- related to cryptography. 10.15 should be a hard requirement.

This trend is likely to continue.

christianscheuer commented 1 year ago

Thanks @filipnavara for clearing this up. Apologies, I wasn't aware the linker would act like this (but in hindsight, it makes total sense).

@vcsjones I fully understand the issues of targeting older versions of macOS. If we were able to require 10.15+ minimum from our customers we would definitely do that, but in our industry (global film- and media industry), there's a ton of legacy hardware that will take years still to be replaced. This is even true for large enterprise customers.

That being said, it's good to know that this is essentially a non-issue as long as you're targeting 10.15+ as per the official guidelines.

For us, if we could get to November and 8.0 official support and end up with just a few source code changes (or even none, as today) in the runtime to keep macOS 10.12 working, that would be enough to keep us going for several years to allow customers more time to upgrade. As mentioned, even with the changes that have been made in the runtime to drop pre-10.15 support, as long as we aren't hitting those code paths, it doesn't cause us any trouble. I guess that's my way of saying that if those changes that drop pre-10.15 support could generally wait until after 8.0 that would be my vote - but otherwise, we would be fine to keep a fork with a minimum amount of divergences if ultimately needed.

am11 commented 1 year ago

The minimum supported version of macOS in .NET 7 is 10.15 https://github.com/dotnet/core/blob/main/release-notes/7.0/supported-os.md, so I think we don't need to worry about down level versions.

The only viable approach is marking the Swift libraries as weak references. You can keep using the SwiftC compiler and the linker will then load them only if they exist. You just have to add a layer of guards against calling the function on a downlevel OS. These guards can be in native or managed code, either way works.

If it removes both core and foundation from otool -L ./simple-nativeaot-app, that would be a good option to consider for apps which do not use those features.

filipnavara commented 1 year ago

If it removes both core and foundation from otool -L ./simple-nativeaot-app

It doesn't remove them. It marks them "weak". dyld binds it if the libraries exist or leaves it null. It's your responsibility to not dereference the nulls.

Apple's tooling uses the same mechanism for targeting features from newer SDKs when targeting down level OS versions. It has a set of C/Obj-C extensions that mark the availability of APIs, and then enforces their usage only through if (__builtin_available(...)) blocks where the new APIs are weakly linked. (Essentially, it's very similar to .NETs SupportedOSPlatformAttribute/UnsupportedOSPlatformAttribute attributes and Platform Compatibility Analyzer.)

vcsjones commented 1 year ago

if we could get to November and 8.0 official support and end up with just a few source code changes (or even none, as today) in the runtime to keep macOS 10.12 working,

It's not my call, but I would oppose any changes to the runtime to attempt to support versions of macOS that are not supported. We would have no formal way to make sure these OS versions continue to work since they are not present in CI or validated.

am11 commented 1 year ago

It doesn't remove them.

Then it is not what I was talking about. I was only interested in removing the dynamic linkage, which option 1 should be able to do.

filipnavara commented 1 year ago

It doesn't remove them.

Then it is not what I was talking about. I was only interested in removing the dynamic linkage, which option 1 should be able to do.

I am not sure I understand you then. If the end goal is to enable running the app on a machine that lacks swiftCore/swift* then that's exactly what weak linking is intended for. It's like dlopen+dlsym done for you by the linker+dyld. In fact, it's better because you also get the option to ship the Swift libraries with the app, if you chose to do so, and you would not need an option to exclude the Swift code.

vcsjones commented 1 year ago

Weak linking

Weak linking would require dropping the deployment target in the runtime to 10.14 again, I think. My understanding of Apple weak linking is that it won't actually weak link if the deployment target guarantees its availability.

libswiftcompat.a: to provide a dummy implementation.

I don't know how that is feasible with Swift. We use many APIs implicitly. Here are four random dependencies we get from macOS, that, no where do we actually call in our own code.

(undefined) external _$sS2SycfC (from libswiftCore)
(undefined) external _$sSZss17FixedWidthIntegerRzrlEyxqd__cSzRd__lufC (from libswiftCore)
(undefined) external _$sSiN (from libswiftCore)
(undefined) external _$sSiSZsMc (from libswiftCore)
filipnavara commented 1 year ago

Weak linking would require dropping the deployment target in the runtime to 10.14 again, I think. My understanding of Apple weak linking is that it won't actually weak link if the deployment target guarantees its availability.

Not necessarily, you can force any library to be weakly linked by passing -weak-l instead of -l to the linker, irrespective of the deployment target version.

That solves half of the problem. The SDK chosen by the deployment target still decides on the path (fixed or @rpath-relative). For macOS 10.14- you would need one more linker parameter to add the system path into rpath so it looks in the right location.

All of this should be doable without any change to the runtime at all. As long as you don't use AEAD APIs it would allow the app to run on older OSes (+/- other unrelated quirks). If you want AesGcm.IsSupported and other APIs to handle the situation correctly then you would need a runtime change.

(I realized that I forgot to mention that the weak linking can work either on per-symbol or per-whole-library level; the per library option avoids the need to list the unknown mangled symbols)

christianscheuer commented 1 year ago

All of this should be doable without any change to the runtime at all. As long as you don't use AEAD APIs it would allow the app to run on older OSes (+/- other unrelated quirks). If you want AesGcm.IsSupported and other APIs to handle the situation correctly then you would need a runtime change.

That is great news if this could work. The combination of a forced weak link and the rpath fix (which we can alternatively fix via install_name_tool) yet not requiring runtime changes, sounds like a very nice pragmatic sweet spot that will keep customers like us going for quite a while longer, and at the same time not incur increased maintenance on behalf of the runtime team.

vcsjones commented 1 year ago

I am still apprehensive because there is no way we can guarantee this is going to continue to work. We just don't have macOS 10.14 CI. .NET 8 hasn't even seen a preview1 release. It would be very likely that sometime between now and .NET 8 RTM we manage to introduce another change that breaks < 10.15.

/cc @jeffhandley and @akoeplinger for input (and @bartonjs, but he's out until January).

christianscheuer commented 1 year ago

@vcsjones I fully understand that. We're not talking about officially supporting older versions in any way. For context, we have been using unsupported CoreRT in production since 2017. But if there was a preview build of .NET 8, even just for a few days this winter, before any such future potential changes, that would still most likely buy us several years of ability to support customers on legacy hardware. The challenge is that there's no window where this was ever supported - we just need a snapshot of a runtime that once builds NativeAOT with arm64 and x64 and has no hard dependency on something that causes it to fail loading. If any particular feature fails at runtime, that's much easier to deal with as opposed to the binary not loading at all.

filipnavara commented 1 year ago

I agree with @vcsjones that you are in uncharted, unsupported territory. The things are likely to break, CI won't verify it or catch regressions, etc. Apple already dropped support for the older macOS versions and the .NET support policy just follows that.

am11 commented 1 year ago

libswiftcompat.a: to provide a dummy implementation.

I don't know how that is feasible with Swift.

@vcsjones, we did the same thing with C++: https://github.com/dotnet/runtime/commit/546fad95f1474673832b8bbf459b78cd8a2598da. It took some back and forth to isolate the symbols and tweak the linker options. It should be possible to do similar thing with Swift, I'm not saying it would be straight forward because I know from experience that it won't. ;)

vcsjones commented 1 year ago

The challenge is that there's no window where this was ever supported - we just need a snapshot of a runtime that once builds NativeAOT with arm64 and x64 and has no hard dependency on something that causes it to fail loading. If any particular feature fails at runtime, that's much easier to deal with as opposed to the binary not loading at all.

Thanks. And I apologize if I have sounded unsympathetic and uncaring to your problem. I get it, and I understand that the real world of support doesn't match the ideals of software vendors saying something is no longer supported.

My first thought now is, it appears that you identified a workaround using install_name_tool. If this work around is working, this seems like a better solution than the runtime trying to make a work around that is non-trivial which is to unblock an unsupported scenario, anyway.

Secondly, .NET has not supported Native AOT on macOS so far, anyway. How have you been supporting your application on out-of-support macOS versions thus far? Is it feasible at all to not use Native AOT, since you are attempting to use it in an unsupported fashion, anyway?

christianscheuer commented 1 year ago

@vcsjones yea real world support with customers is quite often a very different place indeed. We have customers in 90 countries, many of which can't afford to upgrade as frequently as Apple likes them to do.

Correct, hopefully, we'll be able to work around this using a combination of install_name_helper to change the load path, and/or using forced weak links to the swift libs (just verified I can do that with <LinkerArg>s). I'll be testing this asap.

More detailed answer below to your question on how/why etc. > Thanks. And I apologize if I have sounded unsympathetic and uncaring to your problem Not at all, I really value the openness of the dialogue. Huge props to everybody participating here, it comes off as very professional to me, and there's nothing in your remarks I wouldn't have said myself if I were in your seat. In terms of supporting on macOS versions that are no longer supported by Apple, our official policy towards customers is that we don't support them either. That being said, 16% of our user base uses those versions, so it's a delicate balance. There's also a significant push for us to support arm64 by the other end of the customer spectrum. Accomplishing both things at once is difficult, as viewed from this thread. Given that we're a hybrid cloud+native application, we also can't leave customers stuck on older versions of our software as it would drift out of sync with the cloud components. We could fork and start producing two different builds, a legacy and a modern build, but that would be higher maintenance. We've been using NativeAOT since it was CoreRT and exceptions and generics didn't work. Initially we wrote code generators to make up for the lack of generics for example. Native compilation is crucial for our use case, for reasons I don't want to dive too much into in a public forum so it was never an alternative to go CoreCLR. As you say, it has never been officially supported by Microsoft - until it will be from .NET 8, hence our attention to this now. The reason this has worked for us for 5 years has been due to the outstandingly excellent support from @jkotas, @MichalStrehovsky, @janvorli etc. who over the years helped mature the CoreRT product and took a very pragmatic and customer friendly approach to making things work for us and others. What we're trying to do now is to ensure for ourselves, and indeed our customers, a smooth transition to the new phase where most of our use cases will be officially supported.
agocke commented 1 year ago

Since this only occurs on 10.14, which is unsupported, closing this out as unsupported. We will likely not do anything here.

christianscheuer commented 1 year ago

Thanks @agocke, that makes sense. Just to update, we got this working (all the way down to macOS 10.12) by using the methods suggested (install_name_tool and weak linking). Been used in production for a few months now with no issues.