Closed jez closed 1 year ago
I thought about this a bit while working on #85.
I think there are actually two separate things:
clang
compiled _for x8664 to produce binaries for arm64 (i.e. cross compiling)clang
compiled for arm64 to produce binaries for arm64I think the first can be done with the feature #85 adds but I also think it's really the second thing that you'd want when running macOS on ARM (so that clang
doesn't needlessly have to run through Rosetta).
To add support for arm64 darwin host platforms, we'd need to:
configure.bzl
that hardcode the CPU constraints to x86_64
macOS
and linux
toolchains regardless of the OS of the host platform; we let toolchain resolution pick the right one)alias
on a select
on the CPU or just running arch
are the two ways that come to mind, both of which seem bad); users who aren't using toolchain resolution yet won't have their setups broken since we can add the appropriate entries to the toolchain_suite
bazel
under Rosetta, I thinkcc_toolchain_config
to take a host_cpu
arg
host_system_name
, and the default target_system_name
, target_cpu
, abi_version
, and abi_libc_version
sysroot.bzl
might need some changes depending on whether the sysroots for arm64 macOS installs are any differentllvm_distributions.bzl
grow entries for arm64 macOS (more on this below)BUILD.tpl
to match configure.bzl
And I think that should be about it.
I think the bigger obstacle right now is that LLVM doesn't publish binaries for arm64 macOS (even though it's definitely possible to build LLVM for arm64 macOS). This effectively means that you'd have to provide your own download URL to to llvm_toolchain
.
Adding tests for this in CI will also be tricky since (afaik) GitHub Actions doesn't yet have ARM worker machines let alone macOS ARM worker machines but that's okay.
Another question is whether we'd want to support creating universal binaries on macOS; I think it makes sense not to default to doing so (users can opt into this by setting some --copts
or adding some extra_compile_flags
with the features in #85 for now; eventually we can make it an option on llvm_toolchain
if there's interest).
@jez I'm happy to try to put together these changes if you're willing to test (I don't have access to an arm64 macOS machine).
I’m more than happy to test! Do you have a sense of when you’d have time to work on this?
Also: I think that even being able to produce arm64 binaries would be an improvement, even if they were built by x86_64 clang. I see that #85 is a draft—is that something you’d like me to test? Or is there something else holding it back from landing?
Okay! I put together a thing that uses #85 to set up a toolchain for arm64 (still x86_64 based so it's cross-compiling).
It took a little doing; I'm not on macOS 11 so my sysroot didn't have the right stuff and it took me a while to realize that the ld
that ships in LLVM releases doesn't have support for the tbd
(TAPI) files that newer macOS SDKs ship. I ended up having to use lld
as the linker; I'm not sure this is totally right:
# newer macOS SDKs use `.tbd` files in their sysroots; `lld` has support for this
# (on recent versions) but `ld` (that ships in the LLVM releases) does not
#
# so, we need to specify that we want to use `lld`
#
# historically, we have not done this for macOS by default because of incomplete
# support for Mach-O in `lld` but newer version seem to have good support.
"extra_linker_flags": ["-fuse-ld=lld"],
I'd love to know what Apple clang
does (just the output of g++ -v -xc++ - <<<"int main() { }"
on an M1 machine should give it away I think).
But anyways, with the above, it does get all the way through linking on my x86_64 macOS machine. I can't actually run the generated binaries but hopefully they do actually work 🤞.
I've attached the workspace I put together to this comment.
bazel run //:test --config=x86
to check that the regular old x86_64 -> x86_64
toolchain works, etc.bazel build //:test --config=arm64
to make the x86_64 -> arm64
toolchain can generate binariesbazel run //:test --config=arm64
to make sure the binaries emitted can actually run on arm64In theory running bazel run //:test
(without any --config
to manually set the target platform) should pick up the arm64 toolchain and use it, depending on what constraints Bazel gives M1 machines. I have no idea how this interacts with Rosetta though; bazel build //:test --toolchain_resolution_debug
should give us some hints.
@jez re: your other questions:
I should have time to try to get the arm64 -> arm64
toolchain working over the weekend; assuming ^ works, that should be fairly straight-forward. The trickiest part will probably be finding/building an arm64 LLVM toolchain to use.
I'd love to know what Apple
clang
does (just the output ofg++ -v -xc++ - <<<"int main() { }"
on an M1 machine should give it away I think).
Here's the output:
I'll test that workspace out now and see what happens.
Here's the output:
Thanks! Can you run /Library/Developer/CommandLineTools/usr/bin/ld -v
also? I'm pretty sure it's just ld64
but just in case.
❯ /Library/Developer/CommandLineTools/usr/bin/ld -v
@(#)PROGRAM:ld PROJECT:ld64-650.9
BUILD 13:09:13 May 28 2021
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
LTO support using: LLVM version 12.0.5, (clang-1205.0.22.11) (static support for 27, runtime is 27)
TAPI support using: Apple TAPI version 12.0.5 (tapi-1205.0.7.1)
is the @macos-11.3-sdk//
repo basically the same set of files I have at /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
, judging from the output.log above?
yup, exactly
I had to grab it externally because I'm on an older version of macOS that doesn't have an SDK with the right stuff to build for arm64
but for host arm64 toolchains (as in arm64 -> arm64
) we shouldn't actually need to grab it; like we do with the other host toolchains, we can just assume what's on the host system actually works when you're targeting the host system
~/stripe/sandbox/rrbutani-workspace 19s
❯ ./bazel run //:test --config=x86
Starting local Bazel server and connecting to it...
ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin'
ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed
INFO: Elapsed time: 69.390s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (16 packages loaded, 61 targets configured)
FAILED: Build did NOT complete successfully (16 packages loaded, 61 targets configured)
~/stripe/sandbox/rrbutani-workspace
❯ ./bazel --version
bazel 4.2.1
(I had to make a slight change to the repo, which is to make it use a script that has contents identical to this:
https://github.com/jez/ragel-bison-parser-sandbox/blob/master/bazel
because our company laptops prevent us from installing bazelisk) but otherwise the above is the result of running things.
Interestingly enough, that's the same error I get when trying to build a normal bazel project on my macbook. For example this tiny project shows the same problems.
hmm
Does /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48
look like:
cc_toolchain_suite(
name = "toolchain",
toolchains = {
"k8|clang": ":cc-clang-linux",
"darwin|clang": ":cc-clang-darwin",
"k8": ":cc-clang-linux",
"darwin": ":cc-clang-darwin",
},
)
for you?
oh whoops, nvm; I totally missed that that's @local_config_cc
It's not clear to me why it's even analyzing @local_config_cc//:toolchain
; can you post what bazel build //:test --toolchain_resolution_debug
prints out?
It's not clear to me why it's even analyzing
@local_config_cc//:toolchain
; can you post whatbazel build //:test --toolchain_resolution_debug
prints out?
~/stripe/sandbox/rrbutani-workspace
❯ bazel build //:test --toolchain_resolution_debug
INFO: Build options --platforms and --toolchain_resolution_debug have changed, discarding analysis cache.
INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @llvm_toolchain//:cc-clang-linux; mismatching values: linux
INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: execution @local_config_platform//:host: Selected toolchain @llvm_toolchain//:cc-clang-darwin
INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: Rejected toolchain //:clang-darwin-arm64-toolchain; mismatching values: arm64
INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @local_config_cc//:cc-compiler-armeabi-v7a; mismatching values: arm, android
INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/cpp:toolchain_type -> toolchain @llvm_toolchain//:cc-clang-darwin
INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host,
INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host,
INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host,
ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin'
ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed
INFO: Elapsed time: 0.924s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 60 targets configured)
Also in your original post you mentioned that you have an x86_64 -> x86_64
setup working; did you have to manually set your host platform or change anything toolchain related to get that to work?
Also in your original post you mentioned that you have an
x86_64 -> x86_64
setup working; did you have to manually set your host platform or change anything toolchain related to get that to work?
Yeah, overnight that seems to have stopped working. I can't explain that.
It's not clear to me why it's even analyzing
@local_config_cc//:toolchain
; can you post whatbazel build //:test --toolchain_resolution_debug
prints out?~/stripe/sandbox/rrbutani-workspace ❯ bazel build //:test --toolchain_resolution_debug INFO: Build options --platforms and --toolchain_resolution_debug have changed, discarding analysis cache. INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @llvm_toolchain//:cc-clang-linux; mismatching values: linux INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: execution @local_config_platform//:host: Selected toolchain @llvm_toolchain//:cc-clang-darwin INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: Rejected toolchain //:clang-darwin-arm64-toolchain; mismatching values: arm64 INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @local_config_cc//:cc-compiler-armeabi-v7a; mismatching values: arm, android INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/cpp:toolchain_type -> toolchain @llvm_toolchain//:cc-clang-darwin INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin' ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed INFO: Elapsed time: 0.924s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded, 60 targets configured)
That's odd; toolchain resolution happens exactly as we'd expect (it picks the x86 clang toolchain for macOS) but it still pulls in @local_config_cc
😕.
Can you try running bazel cquery 'deps(//:test)' --output=graph --config=x86
? It should at least tell us what's pulling it in.
@local_config_cc//:toolchain
shouldn't be broken though :-/
(actually the contents of /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD
would also be interesting to look at)
local_config_cc/BUILD
and the other one:
❯ bazel cquery 'deps(//:test)' --output=graph --config=x86
INFO: Build options --platforms and --toolchain_resolution_debug have changed, discarding analysis cache.
ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin'
ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed
INFO: Elapsed time: 0.791s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 61 targets configured)
❯ bazel cquery 'deps(//:test)' --output=graph --config=x86 INFO: Build options --platforms and --toolchain_resolution_debug have changed, discarding analysis cache. ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin' ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed INFO: Elapsed time: 0.791s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded, 61 targets configured)
oh whoops; does it get any further if you add --keep_going
?
wait ^ makes sense actually; there isn't an x86_64 darwin toolchain in @local_cc_config
for some reason I was under the impression that Bazel was running under Rosetta in which case there would be, I think
does the error go away if you don't specify --config=x86
?
oh whoops; does it get any further if you add
--keep_going
?does the error go away if you don't specify
--config=x86
?
I get the same output for all combinations of --keep_going
and --config={x86,arm}
❯ bazel cquery 'deps(//:test)' --output=graph --config=arm64 --keep_going
INFO: Build option --platforms has changed, discarding analysis cache.
ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin'
WARNING: errors encountered while analyzing target '//:test': it will not be built
INFO: Analyzed target //:test (1 packages loaded, 4140 targets configured).
INFO: Found 0 targets...
INFO: Empty query results
digraph mygraph {
node [shape=box];
}
ERROR: command succeeded, but not all targets were analyzed
INFO: Elapsed time: 16.547s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
is the output when --keep_going
is present
😕
how about with --host_cpu=darwin_arm64
and/or --cpu=darwin_arm64
?
Looks like that works?
~/stripe/sandbox/rrbutani-workspace
❯ bazel build //:test --cpu=darwin_arm64 --config=arm64
INFO: Build option --cpu has changed, discarding analysis cache.
INFO: Analyzed target //:test (0 packages loaded, 4144 targets configured).
INFO: Found 1 target...
INFO: From Linking test:
ld64.lld: warning: ignoring unknown argument: -headerpad_max_install_names
ld64.lld: warning: -sdk_version is required when emitting min version load command. Setting sdk version to match provided min version
Target //:test up-to-date:
bazel-bin/test
INFO: Elapsed time: 15.586s, Critical Path: 11.68s
INFO: 5 processes: 3 internal, 2 darwin-sandbox.
INFO: Build completed successfully, 5 total actions
~/stripe/sandbox/rrbutani-workspace 15s
❯ file bazel-bin/test
bazel-bin/test: Mach-O 64-bit executable arm64
oh neat
that's super weird though; I thought ^ wouldn't be necessary anymore 😕
Does just --host_cpu=darwin_arm64
also work?
Neither of those works
~/stripe/sandbox/rrbutani-workspace
❯ bazel build //:test --host_cpu=darwin_arm64
INFO: Build options --cpu, --host_cpu, and --platforms have changed, discarding analysis cache.
ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin'
ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed
INFO: Elapsed time: 1.699s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 60 targets configured)
~/stripe/sandbox/rrbutani-workspace
❯ bazel build //:test --host_cpu=darwin_arm64 --config=arm64
INFO: Build option --platforms has changed, discarding analysis cache.
ERROR: /private/var/tmp/_bazel/e9cb4153e0861999826e8879b02ae2cc/external/local_config_cc/BUILD:48:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'darwin'
ERROR: Analysis of target '//:test' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed
INFO: Elapsed time: 1.270s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 4133 targets configured)
You might also be curious to see the cquery output with --cpu=darwin_arm64
thanks, I was just about to ask 😛
that's super weird; maybe something leftover from the _toolchain
/CROSSTOOL era makes cc_binary
s have an implicit dep on @local_config_cc//:toolchain
or something
regardless – I'm glad it works! even though it really does seem like you shouldn't have to pass in --cpu=darwin_arm64
Just to be sure: does actually running the binary that's produced work?
oh wait
I think your script hardcodes x86_64
for the installer it grabs IIUC. Do you know if Bazel is actually getting run through Rosetta?
I think that'd explain the --cpu
stuff, maybe
I think your script hardcodes
x86_64
for the installer it grabs IIUC. Do you know if Bazel is actually getting run through Rosetta?
Interesting, i'll take a look.
Just to be sure: does actually running the binary that's produced work?
uhh...
~/stripe/sandbox/rrbutani-workspace
❯ bazel build //:test --cpu=darwin_arm64 --config=arm64
INFO: Build options --cpu and --host_cpu have changed, discarding analysis cache.
INFO: Analyzed target //:test (0 packages loaded, 4144 targets configured).
INFO: Found 1 target...
Target //:test up-to-date:
bazel-bin/test
INFO: Elapsed time: 1.107s, Critical Path: 0.04s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
~/stripe/sandbox/rrbutani-workspace
❯ ls -lh bazel-bin/test
-r-xr-xr-x 1 jez wheel 39K Sep 2 22:32 bazel-bin/test
~/stripe/sandbox/rrbutani-workspace
❯ file bazel-bin/test
bazel-bin/test: Mach-O 64-bit executable arm64
~/stripe/sandbox/rrbutani-workspace
❯ bazel-bin/test
[1] 30775 killed bazel-bin/test
~/stripe/sandbox/rrbutani-workspace
❯ lldb -- bazel-bin/test
(lldb) target create "bazel-bin/test"
Current executable set to '/Users/jez/stripe/sandbox/rrbutani-workspace/bazel-bin/test' (arm64).
(lldb) r
error: process exited with status -1 (no such process.)
(lldb) ^D
~/stripe/sandbox/rrbutani-workspace 14s
❯ sudo dtruss -- bazel-bin/test
Password:
dtrace: system integrity protection is on, some features will not be available
dtrace: failed to execute bazel-bin/test: (os/kern) failure
oh no
maybe it's the dylibs? usually that prints a real error though
what does otool -L bazel-bin/test
say?
I think your script hardcodes
x86_64
for the installer it grabs IIUC. Do you know if Bazel is actually getting run through Rosetta?
Good catch. When I fix the script to use uname -p
to detect the host processor, it downloads an arm64 version of bazel and then i can build C++ targets without needing the --cpu
flag, and the targets actually run:
❯ ./bazel build //:test --config=arm64
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:test (16 packages loaded, 70 targets configured).
INFO: Found 1 target...
Target //:test up-to-date:
bazel-bin/test
INFO: Elapsed time: 2.691s, Critical Path: 0.34s
INFO: 8 processes: 6 internal, 2 darwin-sandbox.
INFO: Build completed successfully, 8 total actions
~/stripe/sandbox/rrbutani-workspace
❯ file bazel-bin/test
bazel-bin/test: Mach-O 64-bit executable arm64
~/stripe/sandbox/rrbutani-workspace
❯ bazel-bin/test
yo!
I tried this before I got a chance to run otool
on the old executable. I'll probably skip that, because we should be using arm64 bazel releases anyways.
👌
It's definitely strange that the executable produced with x86_64 bazel didn't work (it should be the exact same as the executable you have now, I'm pretty sure – the toolchain is the same either way) but it'll take it.
Thanks for testing!
Yep, thanks for helping debug!
For what it's worth, I also tried replacing the http_archive with this:
new_local_repository(
name = "macos-11.3-sdk",
path = "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk",
build_file_content = """
filegroup(
name = "sysroot",
srcs = glob(["usr/**"], exclude = ["usr/share/**"]),
visibility = ["//visibility:public"],
)
""",
)
and it also builds and runs successfully (figured I'd mention it at least for posterity's sake, even if you don't end up incorporating it in a change).
Though (not sure how this stuff works) maybe that's the sort of thing that wouldn't even be required to declare explicitly if you were ok with an SDK being too old.
Probably tomorrow or this weekend I'll see if I can take the mini-workspace you made and incorporate it into the build system for https://github.com/sorbet/sorbet. It's not the craziest C++ codebase, but it'll be a little more of a stress test than a hello world program.
Anyways, thanks again!
Also for posterity's sake: the final contents of the workspace that we got working in the end:
and the full build command:
bazel build //:test --config=arm64
For what it's worth, I also tried replacing the http_archive with this:
new_local_repository( name = "macos-11.3-sdk", path = "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk", build_file_content = """ filegroup( name = "sysroot", srcs = glob(["usr/**"], exclude = ["usr/share/**"]), visibility = ["//visibility:public"], ) """, )
and it also builds and runs successfully (figured I'd mention it at least for posterity's sake, even if you don't end up incorporating it in a change).
Thanks; good to know that works as expected.
Though (not sure how this stuff works) maybe that's the sort of thing that wouldn't even be required to declare explicitly if you were ok with an SDK being too old.
For host toolchains this logic picks up the SDK and refers to it by absolute path in the generated toolchains.
For extra_targets
toolchains I think it makes sense to fetch the sysroot, like we do for WASI. That way if someone tries to, for example, specify aarch64-apple-darwin
as an extra target on a Linux host machine, they won't need to go find and fetch an SDK themselves.
Probably tomorrow or this weekend I'll see if I can take the mini-workspace you made and incorporate it into the build system for https://github.com/sorbet/sorbet. It's not the craziest C++ codebase, but it'll be a little more of a stress test than a hello world program.
Thanks! Looking forward to seeing what else I broke 😄
@jez do you have the commit that made it work in sorbet?
I haven’t done any further work on this
I've been working on compiling the Sorbet project using the provided ARM64 toolchains. It seems that the playground workspace changes work as intended, but there is one thing missing for me to be able to fully verify the build completes and works.
The only change I had to make to the playground code was setting custom_target_triple = "arm64-apple-macosx12.1.0"
in the cc_toolchain_config
. If I don't do that, we end up having the triple as iOS for some reason (which doesn't happen in the playground, but does happen in Sorbet).
Also, despite being on the most recent macos, I still have to use -mlinker-version=450
or else linking fails because the flag platform_version
is not recognized.
The last remaining error is when linking we get the error unable to find framework Foundation
. This framework is used by one of Sorbet's dependencies (abseil). I tried a few things to fix this, but have been unable so far. Here is what I tried:
-F/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks
does not work (the search paths are correct, but linking still can't find it)-F/System/Library/Frameworks
-framework Foundation
flag using LLVM 12 and 13 standalone outside of bazel-toolchain
. My understanding based on the release notes for LLD 13 was that support for ARM64 was added in that version. However, I was able to compile using the framework flag using LLVM standalone both version 12 and 13. I can't test LLVM 13 with the playground changes though, because that version doesn't support LLVM 13Would it be possible to rebase that playground version with the current LLVM 13 support that is already in main? Do you believe that is likely the cause of frameworks not working or could it be something else that I am missing?
I did try to use the current version in main to try to upgrade to LLVM 13, but doing so fails with this error, so maybe we need something else too.
Error in fail: Unknown LLVM release: clang+llvm-13.0.0-arm64-apple-darwin.tar.xz
Note: another thing worth noting is that, even when using the playground, we must sign the resulting binary with codesign
or else M1 machines will not execute it. I noticed I don't have to do that when using LLVM standalone, so I wonder if this is within the scope of bazel-toolchain
or not (or if LLVM 13 fixes it).
@vinistock which playground workspace are you referring to? Do you have a fork/branch that you could share?
@vinistock which playground workspace are you referring to? Do you have a fork/branch that you could share?
I think they're referring to the workspace attached in this comment.
The only change I had to make to the playground code was setting
custom_target_triple = "arm64-apple-macosx12.1.0"
in thecc_toolchain_config
. If I don't do that, we end up having the triple as iOS for some reason (which doesn't happen in the playground, but does happen in Sorbet).
Hmm. Are there maybe other toolchains registered by the Sorbet workspace? Can you verify that the aarch64-apple-darwin
actually gets used (--toolchain_resolution_debug
)? The constraints are generated from the target triple so it's definitely possible something shifted/is broken and is causing arm64-apple-*
to generate constraints that result in the toolchain being used but not aarch64-apple-darwin
(iirc arm64
and aarch64
are the same CPU constraint though). I'll have to look into it more, later.
Also, despite being on the most recent macos, I still have to use
-mlinker-version=450
or else linking fails because the flagplatform_version
is not recognized.
This is odd; lld has definitely had support for -platform_version
for several releases. 😕
The last remaining error is when linking we get the error
unable to find framework Foundation
. This framework is used by one of Sorbet's dependencies (abseil). I tried a few things to fix this, but have been unable so far. Here is what I tried:* Adding the framework paths using `-F/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks` does not work (the search paths are correct, but linking still can't find it) * Same thing for `-F/System/Library/Frameworks` * Tried compiling with the `-framework Foundation` flag using LLVM 12 and 13 standalone outside of `bazel-toolchain`. My understanding based on the [release notes for LLD 13](https://releases.llvm.org/13.0.0/tools/lld/docs/ReleaseNotes.html#id9) was that support for ARM64 was added in that version. However, I was able to compile using the framework flag using LLVM standalone both version 12 and 13. I can't test LLVM 13 with the playground changes though, because that version doesn't support LLVM 13
Not sure what's going on here; bazel-toolchain
should add that path to the search directories anyways (through the sysroot which is set to be /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
in the playground workspace).
Maybe this has finally become an issue? It'd be helpful to compare against Sorbet on regular x86_64 macOS with the same toolchain (if that builds successfully).
Would it be possible to rebase that playground version with the current LLVM 13 support that is already in main?
The playground is based on #85; main
has grown some significant changes in the meantime that make rebasing that PR not entirely trivial.
Do you believe that is likely the cause of frameworks not working or could it be something else that I am missing?
I'm not really sure what's going on but I don't think it's an LLVM version issue.
I did try to use the current version in main to try to upgrade to LLVM 13, but doing so fails with this error, so maybe we need something else too.
Error in fail: Unknown LLVM release: clang+llvm-13.0.0-arm64-apple-darwin.tar.xz
The version of this repo currently in main
is trying to grab arm64
binaries for LLVM on macOS for which there aren't official LLVM releases. The playground workspace just uses the x86_64
macOS LLVM binaries through Rosetta IIRC. (main
also doesn't have the custom toolchain stuff #85 has).
Note: another thing worth noting is that, even when using the playground, we must sign the resulting binary with
codesign
or else M1 machines will not execute it. I noticed I don't have to do that when using LLVM standalone, so I wonder if this is within the scope ofbazel-toolchain
or not (or if LLVM 13 fixes it).
My understanding is that lld
does sign the binaries it produces but that this only landed in LLVM 13.
@jez do you remember if you needed to explicitly sign the binaries you got out of Bazel?
Regardless, I think this is very much in scope for bazel-toolchain
.
@vinistock Thanks for the notes! I now have access to an Apple Silicon machine; I'm hoping to get back to working on this issue early next week.
If possible, can you post your modified workspace/the commands you are running?
@rrbutani thank you so much for the detailed and quick response. I really appreciate your assistance.
I'm not exactly sure how I should be reading this output. Please, let me know if you need more information. But running build
using the --toolchain_resolution_debug
flag, prints these statements related to the ARM64 toolchain (plus other information related to other things).
It seems to reject the toolchain, but then selects it afterwards. Not sure if there's an issue in toolchain resolution here.
INFO: ToolchainResolution: Type @bazel_tools//tools/cpp:toolchain_type: target platform //:apple-silicon: execution @local_config_platform//:host: Selected toolchain //:clang-darwin-arm64-toolchain
...
INFO: ToolchainResolution: Target platform //:apple-silicon: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/cpp:toolchain_type -> toolchain //:clang-darwin-arm64-toolchain
...
INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/cpp:toolchain_type -> toolchain @llvm_toolchain_12_0_0//:cc-clang-darwin
With the same style of configuration of the playground, if I use --platforms=@//:apple-x86
I can compile Sorbet and do not have the error related to the Foundation framework. So, for some reason, this indeed seems like an ARM64 related issue.
The playground example I meant is rrbutani-workspace
. I actually forgot to upload, but I modified that workspace to reproduce the framework issue. It's the same workspace, but I changed the triple
and added a linkopt
for -framework Foundation
, which is exactly what abseil
does. It will probably be easier to debug this in the playground, since it has a much simpler configuration.
In this playground repo, if the link option for framework is present I cannot compile with the error unable to find framework Foundation
(despite the search paths being correct).
If I remove the -framework Foundation
flag, I can compile, but cannot run the executable without force signing it with codesign
.
Notice that in this playground repo, I can also compile the x86 version successfully even with the framework flag (exactly as in Sorbet).
The command I'm using to compile is
# Successfully compiles even with the -framework Foundation linkopt
bazel build //:test --config=x86
# Fails with `unable to find framework Foundation`
bazel build //:test --config=arm64
Since we determined that we'll need LLVM 13 for signing binaries, I have begun looking into what we'll need to upgrade Sorbet to LLVM 13, using the latest version of bazel-toolchain
.
The upgrade is relatively smooth, but I bumped into another issue that I'm having trouble figuring out. Please, let me know if this is not within the scope of bazel-toolchain
.
Basically, there were two steps
13.0.0
incompatible_enable_cc_toolchain_resolution
as instructed in the READMEAfter the upgrade, I was able to determine that this invocation to find_cpp_toolchain started returning the wrong toolchain.
In version 12, the toolchain object we get has all the LLVM paths in it (includes, cc_wrapper
and so on), all pointing to (bazel info output_base)/external/llvm_toolchain/...
.
With version 13, the latest bazel-toolchain
and the incompatible_enable_cc_toolchain_resolution
flag, the same call returns a different toolchain, where the compiler path is different and points to external/local_config_cc/wrapped_clang
. The object includes none of the LLVM paths.
In addition to returning the wrong toolchain, I also noticed that all paths are now relative, whereas when using version 12 they are all absolute.
In the Bazel issue linked in the README describing the migration for incompatible_enable_cc_toolchain_resolution
, they mention that find_cpp_toolchain
has been deprecated and that we should instead use find_cc_toolchain
.
I tried doing the migration, making sure that the rules depended on the right toolchains and that we were now using the new rules_cc
dependency. Unfortunately, this did not fix the problem and the new find_cc_toolchain
still returns the same incorrect response.
I believe this might be related to the incompatible_enable_cc_toolchain_resolution
flag and not LLVM 13 itself.
Any ideas on why the LLVM toolchain is not being returned by find_cc_toolchain
? Also, please let me know if I can provide more information to be more helpful in the investigations.
Please, let me know if this is not within the scope of
bazel-toolchain
.
This is a little out of scope for this project but that's fine.
After the upgrade, I was able to determine that this invocation to find_cpp_toolchain started returning the wrong toolchain.
In version 12, the toolchain object we get has all the LLVM paths in it (includes,
cc_wrapper
and so on), all pointing to(bazel info output_base)/external/llvm_toolchain/...
.With version 13, the latest
bazel-toolchain
and theincompatible_enable_cc_toolchain_resolution
flag, the same call returns a different toolchain, where the compiler path is different and points toexternal/local_config_cc/wrapped_clang
. The object includes none of the LLVM paths.In addition to returning the wrong toolchain, I also noticed that all paths are now relative, whereas when using version 12 they are all absolute.
It sounds like toolchain resolution is giving you back the toolchain installed on your machine instead of the one from bazel-toolchain
.
Did you also remember to add a call to llvm_register_toolchains()
in your WORKSPACE
(or add --extra_toolchains=...
to your .bazelrc)? These don't seem to be in the upstream version of sorbet
.
In case you haven't already come across it, these docs do a good job explaining toolchains, toolchain resolution, and how rulesets should use toolchains. For C/C++ toolchains the actual toolchain lookup (ctx.toolchains
) is handled for you by find_cpp_toolchain
in a way that's compatible with both workspaces that are and are not using toolchain resolution.
Making the changes described in ^ (i.e. using ctx.toolchains
instead of the legacy hidden attribute) isn't required to use toolchain resolution for C/C++ but if you do end up making those changes you may also want to add incompatible_use_toolchain_transition = True
to your rule definition.
Building a target that uses that rule with --toolchain_resolution_debug
can also be a good way to try to figure out exact why Bazel is picking another toolchain to give to that rule.
In the Bazel issue linked in the README describing the migration for
incompatible_enable_cc_toolchain_resolution
, they mention thatfind_cpp_toolchain
has been deprecated and that we should instead usefind_cc_toolchain
.I tried doing the migration, making sure that the rules depended on the right toolchains and that we were now using the new
rules_cc
dependency. Unfortunately, this did not fix the problem and the newfind_cc_toolchain
still returns the same incorrect response.
I'm fairly confident this is unrelated; find_cpp_toolchain
in @rules_cc
is "deprecated" but just calls find_cc_toolchain
anyways and @rules_cc
's find_cc_toolchain
is essentially a verbatim copy of @bazel_tools
's. Both have the logic to use ctx.toolchains
when C/C++ toolchain resolution is enabled.
There used to be a bug in the @rules_cc
impl caused by a bug in how Bazel handles aliases in string form labels when used with ctx.toolchains
but it's since been "fixed" with this workaround; just make sure you're using a version of @rules_cc
newer than that commit if you're planning to keep your @rules_cc
changes.
@rrbutani once again, thank you for the quick and detailed response. I indeed had forgotten to invoke llvm_register_toolchains
in the branch I'm working on the LLVM upgrade. I apologize for the confusion, I'm working on multiple branches and got lost. Invoking llvm_register_toolchains
fixes the issues with finding the right toolchains.
I'm now hitting one last error before successfully compiling the custom Ruby build. One of Ruby's arguments during compilation is -install_name @execution_path/../lib/libruby.2.7.dylib
. The error is
.../llvm_toolchain_13_0_0/bin/cc_wrapper.sh: line 54: executable_path/../lib/libruby.2.7.dylib: No such file or directory
The reason this happens is because the cc_wrapper
tries to read the paths for arguments beginning with @
here. However, Ruby adds the install_name
flag when compiling libruby.2.7.dylib
itself, which means the file indeed doesn't exist at that step. The command has this form
.../cc_wrapper ... -install_name @execution_path/../lib/libruby.2.7.dylib -o libruby.2.7.dylib
Notice that the file not existing only fails because cc_wrapper
tries to read it. Invoking clang
directly from the llvm_toolchain_llvm/bin
folder works. Commenting out the part of cc_wrapper
that attempts to read paths starting with @
also makes the build succeed. The step that tries to read the paths was added in #97.
Do you have any context as to why that is necessary or if we can workaround it somehow?
I apologize for the confusion, I'm working on multiple branches and got lost. Invoking
llvm_register_toolchains
fixes the issues with finding the right toolchains.
No worries! Glad to hear it was a simple fix.
I'm now hitting one last error before successfully compiling the custom Ruby build. One of Ruby's arguments during compilation is
-install_name @execution_path/../lib/libruby.2.7.dylib
. The error is.../llvm_toolchain_13_0_0/bin/cc_wrapper.sh: line 54: executable_path/../lib/libruby.2.7.dylib: No such file or directory
The reason this happens is because the
cc_wrapper
tries to read the paths for arguments beginning with@
here. However, Ruby adds theinstall_name
flag when compilinglibruby.2.7.dylib
itself, which means the file indeed doesn't exist at that step. The command has this form.../cc_wrapper ... -install_name @execution_path/../lib/libruby.2.7.dylib -o libruby.2.7.dylib
Notice that the file not existing only fails because
cc_wrapper
tries to read it. Invokingclang
directly from thellvm_toolchain_llvm/bin
folder works. Commenting out the part ofcc_wrapper
that attempts to read paths starting with@
also makes the build succeed. The step that tries to read the paths was added in #97.Do you have any context as to why that is necessary or if we can workaround it somehow?
This is a good catch!
That snippet was added to support parameter files.
The macOS cc wrapper script inspects the full command line in order to remap libraries that are being linked against to their fully resolved paths, taking into account the rpath
s added to the binary. I don't have first-hand experience with this but this is allegedly because of some oddness having to do with runpaths added to binaries (this has some context; I think it's that the paths are relative to the build dir and not the workspace that's causing the issue but I have no idea what's introducing the -Wl,-rpath
s in the first place).
Anyways, for that reason we need to actually read what's in the parameter file. The PR in this repo you linked to was essentially copied from upstream (this commit); in general the logic in the macOS wrapper mostly comes from upstream.
The issue here, of course, is that the @
in -install_name @executable_path/...
does not signify a parameter file!
What's peculiar to me is that upstream Bazel seems to fail on this in the exact same way (here's a minimal test case). Perhaps it's simply not common for users to want to generate dylibs with install_path
s from Bazel and it hasn't come up? Not sure.
I think extending the logic in the macOS wrapper to skip processing args starting with @
(like @executable_path/...
, @load_path/..
, @rpath/...
, etc.) when the preceeding arg is -install_name
or -rpath
would fix the error you're running into.
I have a few concerns though:
bazel-toolchain
or with any other toolchains?-install_name
and -rpath
the only args that accept @
form args?-install_name
and -rpath
?
-install_name
but what about -rpath
? Don't we have the same issues as with -Wl,-rpath
? -rpath
certainly does seem to just expand out into -rpath
to the linker, experimentally. Are we just banking on users using the not-macOS specific -Wl
form?@loader_path
?@vinistock If possible, it'd be super helpful if you could point me to where in your workspace that flag is getting added.
I just got an M1 MacBook Pro today, and am looking into how to use this project to generate arm64 binaries (for the record: everything works fine using this project to generate x86_64 binaries, which then run under Rosetta).
In the Apple Developer Documentation, they make it out to be as simple as passing a
-target
flag to the C compiler, though I'm sure it'll be more work to do the same thing in Bazel.https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary
Have anyone put thought or time into how this project might be extended to support generating arm64 binaries on M1 Macs? I'm probably going to be spending some time getting this working, and I'd love any tips, ideas, or places to start.