f0rmiga / gcc-toolchain

A fully-hermetic Bazel GCC toolchain for Linux.
Apache License 2.0
98 stars 22 forks source link

[Bug]: Keeping `--cpu=k8` on macos throws `does not contain a toolchain for cpu 'k8'`. Removing the cpu flag breaks cross-platform caching for remote builds with the same remote_host architecture. #130

Closed freetheinterns closed 1 year ago

freetheinterns commented 1 year ago

What happened?

Since switching over from the native cpp toolchain to this hermetic one we have noticed that the execution root found under bazel-out changed from k8-fastbuild to darwin_arm64-fastbuild on local macos machines. This may not seem like an issue at the surface, but we also have CI which runs Bazel and in that case the host is still k8.

The effect of this is that CI & local runs no longer share cache hits because the pathing changes are included in cache keys. Additionally our remote workers see a change in the JDK from this pathing and that change causes workers to restart for no reason leading to churn and delays.

We suspect the root cause of this is from needing to remove --cpu=k8 --host_cpu=k8 when switching over to the new cpp toolchain. These flags appear to also impact the bazel-out pathing. Setting these flags with the new toolchain produces this error:

/private/var/tmp/_bazel_ted_tenedorio/544c8a90b96ef2c7562b61f1f903de27/external/local_config_cc/BUILD:28:19: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'k8'.

Ideally we could make changes to this toolchain so that those flags can be used again.

Version

Development (host) and target OS/architectures: Host: Macos - cpu = darwin_arm64 Target: Linux - cpu = k8

Output of bazel --version: 6.0.0

Version of the Aspect rules, or other relevant rules from your WORKSPACE or MODULE.bazel file: GCC Toolchain: aspect-build/gcc-toolchain/archive/refs/tags/0.4.2.tar.gz rules_java: bazelbuild/rules_java/releases/download/0.1.1/rules_java-0.1.1.tar.gz rules_kotlin: bazelbuild/rules_kotlin/releases/download/v1.7.1/rules_kotlin_release.tgz JDK 11: https://corretto.aws/downloads/resources/8.275.01.1/amazon-corretto-8.275.01.1-linux-x64.tar.gz https://corretto.aws/downloads/resources/8.275.01.1/amazon-corretto-8.275.01.1-macosx-x64.tar.gz

Language(s) and/or frameworks involved: CPP & Java/Kotlin. The issue is impacting all of caching though as it changes the bazel-out relative pathing.

How to reproduce

Run this on a macOS machine to see the bazel-out pathing of concern:
`bazel aquery //path/to/some/java_library/package --config remote`

And observe the `--output` field for the sourcejar:

action 'Building source jar path/to/some/java_library/package/package_lib-sources.jar'
  Mnemonic: JavaSourceJar
  Target: //path/to/some/java_library/package:package_lib
  Configuration: darwin_arm64-fastbuild
  Execution platform: @aspect_gcc_toolchain//platforms:x86_64_linux_remote
  ActionKey: 29ca9f15b7392f2feddf3804b63bae3341fa053f43482c2384bf630347b62682
  Inputs: [path/to/some/java_library/package/PackageSourceFile.kt, external/remote_java_tools_linux/java_tools/src/tools/singlejar/singlejar_local]
  Outputs: [bazel-out/darwin_arm64-fastbuild/bin/path/to/some/java_library/package/package_lib-sources.jar]
  ExecutionInfo: {OSFamily: Linux, container-image: REDACTED}
  Command Line: (exec external/remote_java_tools_linux/java_tools/src/tools/singlejar/singlejar_local \
    --output \
    bazel-out/darwin_arm64-fastbuild/bin/bazel/path/to/some/java_library/package/package_lib-sources.jar \
    --compression \
    --normalize \
    --exclude_build_data \
    --warn_duplicate_resources \
    --resources \

Notice how darwin_arm64-fastbuild is in the root: bazel-out/darwin_arm64-fastbuild/bin/bazel/path/to/some/java_library/package/package_lib-sources.jar

Running the same command on a Linux machine produces: bazel-out/k8-fastbuild/bin/bazel/path/to/some/java_library/package/package_lib-sources.jar

You can also run this to see the cpu flag error: bazel build //path/to/some/java_library/package --config remote --cpu=k8 --host_cpu=k8

The remote config here is essentially:

--java_language_version=8
--java_runtime_version=remotejdk_11
--tool_java_language_version=11
--tool_java_runtime_version=remotejdk_11
--action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
--incompatible_strict_action_env=true
--incompatible_enable_cc_toolchain_resolution
--extra_execution_platforms=@aspect_gcc_toolchain//platforms:x86_64_linux_remote
--host_platform=@aspect_gcc_toolchain//platforms:x86_64_linux_remote
--symlink_prefix=dist/
--remote_timeout=3600
--jobs=1024
--remote_local_fallback=false
--compiler_warnings=off
--define=EXECUTOR=remote
--disk_cache=
--remote_download_minimal
--remote_default_exec_properties="container-image=REDACTED"
--remote_default_exec_properties="OSFamily=Linux"
--experimental_remote_mark_tool_inputs=true
--remote_default_exec_properties="dockerReuse=True"
--remote_executor=grpcs:REMOTE_URL
--remote_cache=grpcs:REMOTE_URL


### Any other information?

_No response_
f0rmiga commented 1 year ago

Fixed by https://github.com/aspect-build/gcc-toolchain/commit/2d7e1039cba8f1d6e630e4ffa3694a08aec87055 and https://github.com/aspect-build/gcc-toolchain/commit/727fc04bd7a48a871db3382743328a71ec6888fa.