bazel-contrib / rules_go

Go rules for Bazel
Apache License 2.0
1.39k stars 664 forks source link

Go rules run cpp toolchain from cgo_context_data on an incompatible execution platform #4127

Open jesses-canva opened 1 month ago

jesses-canva commented 1 month ago

What version of rules_go are you using?

v0.47

What version of gazelle are you using?

v0.36.0

What version of Bazel are you using?

7.3.0

Does this issue reproduce with the latest releases of all the above?

Untested

What operating system and processor architecture are you using?

Ubuntu 22.04 x86_64

Any other potentially useful information about your toolchain?

Remote execution

What did you do?

We have a remote execution set up with two execution platforms (registered with --host_platform and --extra_execution_platforms). There is a Go toolchain registered and compatible with both execution platforms, however only the second execution platform has a compatible cpp toolchain registered.

Execution platform    | Available toolchains
----------------------+----------------------------------------------------------------------------
1. A                  | @io_bazel_rules_go//go:toolchain
2. B                  | @io_bazel_rules_go//go:toolchain, @bazel_tools//tools/cpp:toolchain_type

What we observed is that rules_go will try to use the cpp toolchain, that is only compatible with the second execution platform, in an action running on the first one, failing because the cpp toolchain isn't installed on that platform.

Note the execution platform ordering is important, the error we get is because Bazel prefers the first one if it thinks it is compatible with the action.

What did you expect to see?

Successful build

What did you see instead?

ERROR: /var/lib/blah/bazel/a8584ebfb3d6ff0dfe61abfbfa5bb4d3/external/io_bazel_rules_go/BUILD.bazel:42:7: GoStdlib external/io_bazel_rules_go/stdlib_/pkg failed: (Exit 1): builder failed: error executing GoStdlib command (from target @@io_bazel_rules_go//:stdlib)
...
cgo: C compiler "/usr/bin/clang-13" not found: exec: "/usr/bin/clang-13": stat /usr/bin/clang-13: no such file or directory

Discussion

The underlying cause in this case is that the stdlib target depends on the cgo_context_data target here, and cgo_context_data has a dependency on the cpp toolchain here, so its execution platform is constrained to the platforms compatible with the selected toolchain, but instead of executing the compiler it returns the path to it in its provider, here.

In this case the rule that actually executes the compiler is stdlib, but that has no dependency on the cpp toolchain so Bazel doesn't know it has to run on the a platform that is compatible with the cpp toolchain. So it defaults to the first platform and then fails because /usr/bin/clang-13 doesn't exist.

The patch to rules_go we have used is to add the toolchain dependency to all the rules that depend on cgo_context_data, so they also have their execution platform constrained to the platforms compatible with the selected cpp toolchain. (https://github.com/bazelbuild/rules_go/pull/4128)

An even better fix would be to make cgo_context_data a toolchain itself so that it influences the execution platform of the rules that depend on it, but that would be a bigger diff.