Open rickystewart opened 2 years ago
Just for clarity, the linked issue is https://github.com/cockroachdb/cockroach/issues/81314 as the link in the OP text is broken.
race = "off"
not having an effect sounds like a bug in itself. Overall, however, I think that this FR is better served by a different mechanism: We could introduce @io_bazel_rules_go//go/config:tool_race
to govern race detection for targets build in the exec configuration.
@rickystewart Given that you are experiencing the pain of the issue, would you be interested in submitting a PR to fix the race = "off"
issue and/or add the tool_race
flag? If you are interested, I could provide pointers and guidance along the way.
We encountered this issue in the Figma monorepo.
The fact that the stdlib needs to be recompiled with race detection is a problem
For us, this is the primary issue, as building the stdlib causes Bazel to get OOM-killed on our CI executor (4GB RAM), even with --local_{ram,cpu}_resources
configured appropriately. For some reason it seems to get built twice in parallel, once to bazel-out/k8-opt-exec-2B5CBBC6
and once to bazel-out/k8-opt-exec-2B5CBBC6-ST-8f6eb6207b9d
.
go test -race
does not have this issue; presumably it uses a stdlib prebuilt with race detection. Could rules_go do the same?
Bazel now has an API for adding resource requirements to actions. It is still experimental, but we could optionally use it for stdlib actions.
I am working on fixing the issue that the stdlib is rebuilt more often than necessary, but that may also require flags.
We ran into this as well recently. The speed is not an issue thanks to RBE. However, we got this error when race is enabled via a flag.
ERROR: /private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/io_bazel_rules_go/BUILD.bazel:42:7: in stdlib rule @@io_bazel_rules_go//:stdlib:
Traceback (most recent call last):
File "/private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/io_bazel_rules_go/go/private/rules/stdlib.bzl", line 33, column 20, in _stdlib_impl
go = go_context(ctx)
File "/private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/io_bazel_rules_go/go/private/context.bzl", line 458, column 20, in go_context
mode = get_mode(ctx, toolchain, cgo_context_info, go_config_info)
File "/private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/io_bazel_rules_go/go/private/mode.bzl", line 107, column 13, in get_mode
fail("race instrumentation can't be enabled when cgo is disabled. Check that pure is not set to \"off\" and a C/C++ toolchain is configured.")
Error in fail: race instrumentation can't be enabled when cgo is disabled. Check that pure is not set to "off" and a C/C++ toolchain is configured.
ERROR: /private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/io_bazel_rules_go/BUILD.bazel:42:7: Analysis of target '@@io_bazel_rules_go//:stdlib' failed
The issue goes away if I were to remove the go_binary included in //...
that has pure = "on"
enabled explicitly.
Setting race = "off"
directly on the target does not have any effect. I think the flag --@io_bazel_rules_go//go/config:race
would override the setting on the target.
I think a surefire way to fix this is to provide a mechanism to enforce race
setting on a binary, regardless of manual flag set. It's probably gonna be a transition
on top of the go_binary
target to enforce the setting.
I was looking into transition.bzl
as well as go_cross_binary
for this purpose, but I don't think these rules can enforce config:race
to a certain value for my use case. @fmeum do you know if a new transition is required for this purpose? Or we could reuse go_cross_binary
somehow?
I created a quick POC of the transition mentioned above ☝️ here https://github.com/buildbuddy-io/buildbuddy/pull/5748 if anyone wants a quick solution.
@sluongng I would support a PR that simply ignores the value of race
if pure
is set to on
explicitly instead of failing. Would that allow you to get rid of the additional transition?
Regarding the original issue (and @rickystewart): I now think that we should probably just disable race
for targets built in the exec configuration instead of making this comfortable.
More generally, we would want to revisit and specify how our build settings should change under the exec transition. For example, linkmode
can currently affect (that is, break) aspect tools, which is something that @tyler-french ran into.
I see two ways to achieve this today:
go_context
, check ctx.bin_dir
for the substring -exec-
and set other defaults/overrides depending on this.go_transition
, inspect the value of //command_line_option:is exec configuration
(no typo, there really are spaces in this name) as a transition input (if this can be used) and modify the transition behavior based on that information.Contributions would be very welcome.
@sluongng I would support a PR that simply ignores the value of
race
ifpure
is set toon
explicitly instead of failing. Would that allow you to get rid of the additional transition?
Would ignoring race
value work? or do we need a "corrective" transition to disable race
when pure
is explicitly set to on
? 🤔
Looking at this https://github.com/bazelbuild/rules_go/blob/30099a6add3c43706b4eec82b773b78310874935/go/private/actions/stdlib.bzl#L127-L131
It seems like the diff would be whether we gonna go install runtime/cgo
with/without -race
flag added. I blindly ran
> GOCACHE=/tmp/ go install -x -race runtime/cgo`
and it seems to just work? 🤔
oh nvm, I misread it, the diff is when we have both pure
and race
, the runtime/cgo
package will be missing from go stdlib, which is a requirement for race detection tests.
So the question would be, are there any use cases for such a combination: i.e. setting up a race-detection test with pure=on
? or could we assume that pure=on
will always be set in a non-test binary?
My guess is it's fine to "ignore the value of race if
pureis set to
onexplicitly" as Fabian suggested. But there could still be the concern of adding
-race` to the compile/link actions without needing race detection feature at runtime.
So I think the additional transition would still be a surefire approach to get the "expected" result in this case.
Is there a path forward with this problem? It's making testing with race very expensive.
@aaomidi Do you have a reproducer (or at least a representative example of your build setup)? This issue can manifest itself in a number of different ways.
Over at Cockroach we are having this issue where building a
go_test
with--@io_bazel_rules_go//go/config:race
takes a very long time. The fact that the stdlib needs to be recompiled with race detection is a problem, but more relevant seems to be that building a target with race detection invalidates the cache for allgo_binary
s used to produce that target, includinggo_binary
s that are built asexec_tools
for agenrule
, and causes all of them to be re-built with race detection. If you use Go code to generate other Go code (as we do), the problem is now twofold:go_binary
for code generation cannot be reused, so the binary (and presumably all or most of its dependencies) needs to be rebuilt with race detection enabled (a slower build process in and of itself).Since our code generators produce identical output regardless of whether race detection is enabled, it's very wasteful to re-build the code generators and then run a much slower version of the binary as part of the race build.
Presumably this behavior is OK by default, but there should be a workaround. I have tried updating the
go_binary
declaration for these code generating binaries to haverace = "off"
but it appears to have no effect -- setting the value on the command line overrides everything inBUILD
files. It would be sufficient if I could have a way to specify that certain targets really do not need to be built with race detection despite the command-line flags.What version of rules_go are you using?
0.32
What version of gazelle are you using?
0.25
What version of Bazel are you using?
5.1.0
Does this issue reproduce with the latest releases of all the above?
Yes
What operating system and processor architecture are you using?
Darwin ARM64
Any other potentially useful information about your toolchain?
(None)
What did you do?
Tried setting
race = "off"
for code-generatinggo_binary
target when building a target that depends on that target as anexec_tool
.What did you expect to see?
Ideally the code-generating binary would not be built with race-detection especially if I set
race = "off"
manually.What did you see instead?
Code-generating binary is built with race detection, invalidating the cache and slowing the entire build down.