apple / swift-atomics

Low-level atomic operations for Swift
Apache License 2.0
1.06k stars 50 forks source link

[Bazel] Segfault from ManagedAtomic.__allocating_init(_:) when compiled for linux but not macOS #86

Closed AttilaTheFun closed 1 year ago

AttilaTheFun commented 1 year ago

Hi folks! I'm trying to update Bazel's rules_swift to use a newer version of grpc-swift which in turn uses a newer version of swift-nio which pulls in swift-atomics.

If you're not familiar, Bazel is a build system open-sourced by Google used to build a number of large iOS applications (Snapchat, Spotify, Uber etc). Under the hood it's ultimately just invoking swiftc, though.

The segfault I'm getting looks like this:

ubuntu@focal:~/rules_swift$ lldb bazel-bin/examples/xplatform/grpc/echo_server
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'lldb'
(lldb) target create "bazel-bin/examples/xplatform/grpc/echo_server"
Current executable set to '/home/ubuntu/rules_swift/bazel-bin/examples/xplatform/grpc/echo_server' (aarch64).
(lldb) r
Process 4888 launched: '/home/ubuntu/rules_swift/bazel-bin/examples/xplatform/grpc/echo_server' (aarch64)
Process 4888 stopped
* thread #1, name = 'echo_server', stop reason = signal SIGSEGV: invalid address (fault address: 0x38)
    frame #0: 0x0000aaaaabbd17ac echo_server`ManagedAtomic.__allocating_init(_:) at <compiler-generated>:0
Note: this address is compiler-generated code in function Atomics.ManagedAtomic.__allocating_init(τ_0_0) -> Atomics.ManagedAtomic<τ_0_0> that has no source code associated with it.
Target 0: (echo_server) stopped.

I've verified that the exact same code builds and runs correctly on linux when built with swift build in this demo project: https://github.com/AttilaTheFun/grpc_swift_test

Also the code builds and runs correctly on macOS with both swift build and building with Bazel.

I tried diffing the compiler invocations by Bazel between linux and macOS and these were the flags which macOS had which linux did not:

-Xcc
-O0
-Xcc
-DDEBUG=1
-Xcc
-fstack-protector
-Xcc
-fstack-protector-all

There were no flags that linux had which macOS did not. I tried manually adding the stack protector flags to the linux build and it didn't help.

I also tried diffing the swiftc invocation from swift build and I saw these flags were provided by swift build but not Bazel:

-j2
-swift-version
5

I tried manually adding those to Bazel and it didn't help either.

Information

Linux: swift --version Swift version 5.8 (swift-5.8-RELEASE) Target: aarch64-unknown-linux-gnu

MacOS: swift --version swift-driver version: 1.75.2 Apple Swift version 5.8.1 (swiftlang-5.8.0.124.5 clang-1403.0.22.11.100) Target: arm64-apple-macosx13.0

Checklist

Steps to Reproduce

I created an Apache 2.0 licensed demo project here: https://github.com/AttilaTheFun/grpc_swift_test

If you build the examples_xplatform_grpc_echo_server binary on macOS or ubuntu with swift build you should be able to run the binary and it will log that it is listening on port 9000.

If you build the same target with Bazel on macOS it will work but on ubuntu it will segfault with:

bazelisk build -s --compilation_mode=dbg //examples/xplatform/examples_xplatform_grpc_echo_server

Expected behavior

The executable should not crash on linux.

Actual behavior

The executable crashes on linux.

AttilaTheFun commented 1 year ago

Update: I was able to reproduce this with an even more minimal example that only pulls in Swift Atomics: https://github.com/AttilaTheFun/swift_atomics_test

lorentey commented 1 year ago
* thread #1, name = 'echo_server', stop reason = signal SIGSEGV: invalid address (fault address: 0x38)
    frame #0: 0x0000aaaaabbd17ac echo_server`ManagedAtomic.__allocating_init(_:) at <compiler-generated>:0
Note: this address is compiler-generated code in function Atomics.ManagedAtomic.__allocating_init(τ_0_0) -> Atomics.ManagedAtomic<τ_0_0> that has no source code associated with it.

Interesting. I have not heard of this happening in builds produced by SwiftPM, which strongly indicates that Bazel has something to do with it. On first smell, this seems like a potential miscompilation; the fault address is quite clearly bad -- however, it being specific to Bazel may indicate a build issue, such as the Atomics library and its client somehow disagreeing on the ABI of this call.

It would be good to know what the code expected to be at that location to isolate the problem.

On the source level, one vaguely foul smelling bit is the @_fixed_layout attribute on the ManagedAtomic class. Perhaps it would be worth trying to remove that, to see if it has any effect to the symptoms.

lorentey commented 1 year ago

Update: I was able to reproduce this with an even more minimal example that only pulls in Swift Atomics: https://github.com/AttilaTheFun/swift_atomics_test

Hm, I don't seem to be able to reproduce this issue using this sample code on any of the systems I tried. Does this need any special steps I'm missing?

For example, I found the target //Sources/SwiftAtomicsTest by trial and error -- is that the correct one to build to exhibit the issue? (Keep in mind that I have zero experience using Bazel, and it isn't a supported build system for this package. I want to help figure out what goes wrong, but I'll need a bit of help.)

macOS 13.4.1 with Swift 5.8.1 in Xcode 14.3.1, running on Apple Silicon:

$ sw_vers
ProductName:        macOS
ProductVersion:     13.4.1
BuildVersion:       22F82
$ xcrun swift --version
swift-driver version: 1.75.2 Apple Swift version 5.8.1 (swiftlang-5.8.0.124.5 clang-1403.0.22.11.100)
Target: arm64-apple-macosx13.0
$ bazelisk build -s --compilation_mode=dbg //Sources/SwiftAtomicsTest
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Analyzed target //Sources/SwiftAtomicsTest:SwiftAtomicsTest (50 packages loaded, 628 targets configured).
INFO: Found 1 target...
[...]
SUBCOMMAND: # //Sources/SwiftAtomicsTest:SwiftAtomicsTest [action 'Compiling Swift module //Sources/SwiftAtomicsTest:SwiftAtomicsTest', configuration: e1a919c1e09176eabe5b9ba9607a636d027f4c6af43cfcf8b693eb7ef6f2d773, execution platform: @local_config_platform//:host]
(cd /private/var/tmp/_bazel_lorentey/84d0aed4f838e90fed38d04036eda429/execroot/swift_atomics_test && \
  exec env - \
    APPLE_SDK_PLATFORM=MacOSX \
    APPLE_SDK_VERSION_OVERRIDE=13.3 \
    SWIFT_AVOID_WARNING_USING_OLD_DRIVER=1 \
    XCODE_VERSION_OVERRIDE=14.3.1.14E300c \
  bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/build_bazel_rules_swift/tools/worker/worker swiftc @bazel-out/darwin_arm64-dbg/bin/Sources/SwiftAtomicsTest/SwiftAtomicsTest.swiftmodule-0.params)
# Configuration: e1a919c1e09176eabe5b9ba9607a636d027f4c6af43cfcf8b693eb7ef6f2d773
# Execution platform: @local_config_platform//:host
INFO: From Compiling Swift module //Sources/SwiftAtomicsTest:SwiftAtomicsTest:
Sources/SwiftAtomicsTest/SwiftAtomicsTestMain.swift:8:9: warning: initialization of immutable value 'managedAtomic' was never used; consider replacing with assignment to '_' or removing it
    let managedAtomic = ManagedAtomic(false)
    ~~~~^~~~~~~~~~~~~
    _
[...]
Target //Sources/SwiftAtomicsTest:SwiftAtomicsTest up-to-date:
  bazel-bin/Sources/SwiftAtomicsTest/SwiftAtomicsTest
INFO: Elapsed time: 21.300s, Critical Path: 3.72s
INFO: 33 processes: 10 internal, 21 darwin-sandbox, 2 worker.
INFO: Build completed successfully, 33 total actions
$ ./bazel-bin/Sources/SwiftAtomicsTest/SwiftAtomicsTest
before atomic
after atomic
$

Bazelisk 1.17.0 manually downloaded to a Swift 5.8 docker image for Ubuntu 20.04/Aarch64:

# uname -a
Linux 70f1c5b7-598e-4766-a1d8-e01dc9f892b1 5.15.111 #2 SMP Mon May 15 16:54:21 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
# swift --version
Swift version 5.8.1 (swift-5.8.1-RELEASE)
Target: aarch64-unknown-linux-gnu
# export CC=clang
# [...]/bazelisk-linux-arm64 build -s --compilation_mode=dbg //Sources/SwiftAtomicsTest
2023/07/11 00:36:42 Downloading https://releases.bazel.build/6.2.1/release/bazel-6.2.1-linux-arm64...
Extracting Bazel installation...
[...]
INFO: From Compiling Swift module //Sources/SwiftAtomicsTest:SwiftAtomicsTest:
Sources/SwiftAtomicsTest/SwiftAtomicsTestMain.swift:8:9: warning: initialization of immutable value 'managedAtomic' was never used; consider replacing with assignment to '_' or removing it
    let managedAtomic = ManagedAtomic(false)
    ~~~~^~~~~~~~~~~~~
    _
[...]
# Configuration: bb3cd44152ff91d79df6912150f2a2b45b75a3992cf7930671d836f3a6087d77
# Execution platform: @local_config_platform//:host
Target //Sources/SwiftAtomicsTest:SwiftAtomicsTest up-to-date:
  bazel-bin/Sources/SwiftAtomicsTest/SwiftAtomicsTest
INFO: Elapsed time: 19.632s, Critical Path: 2.56s
INFO: 38 processes: 20 internal, 18 linux-sandbox.
INFO: Build completed successfully, 38 total actions

# ./bazel-bin/Sources/SwiftAtomicsTest/SwiftAtomicsTest
before atomic
after atomic

I also tried a 5.9 nightly build on Ubuntu 22.04, as well as a prerelease version of macOS Sonoma with Swift 5.9.

AttilaTheFun commented 1 year ago

@lorentey sorry I forgot to come back and update this issue, but we found the source of the issue (or at least how to mitigate it). What fixed it was adding the alwayslink = True flag. I believe what might have been happening was that the object file compiled containing the protocol conformance was not being linked. That would explain why it worked when I put the protocol conformance in the same file as ManagedAtomic. I can close this issue now.

https://github.com/bazelbuild/rules_swift/issues/1059#issuecomment-1595847116 https://stackoverflow.com/questions/48653517/what-does-bazel-alwayslink-true-mean

lorentey commented 1 year ago

Ah! That explains it -- a protocol conformance that got lost during linkage would indeed lead to code jumping to small addresses.

Protocol conformance symbols are needed at runtime even if they aren't ever referenced explicitly, so the link optimization that Bazel is trying to do here is not compatible with Swift. It sounds like the alwayslink option ought to be automatically and unconditionally force-enabled for Swift targets.

AttilaTheFun commented 1 year ago

Ah interesting, I'll let the Bazel maintainers know -- maybe we should just set alwayslink to true for all swift libraries.