bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.21k stars 4.06k forks source link

Platform-based flags cannot reset Starlark flags to the default value #23147

Closed tpudlik closed 3 months ago

tpudlik commented 3 months ago

(Original title: "A 'cursed' label flag")

Every Starlark flag has a default value defined when it is declared. If a different value is set, platform-based flags should be able to explicitly set the default and have that override the previous value, but currently they are ignored.

See https://github.com/katre/bazel/commit/2aa5c32c9390160f9ea62a796de06a00786be565 for a small demo.

Original description

Description of the bug:

I found a truly bizarre bug that I can't make heads or tails of. If a custom host platform explicitly sets a particular label flag in Pigweed to a value, and a platform set via platform_data attempts to override it, then the override will fail.

What's bizarre about it is that,

  1. It only happens when the target platform is set via platform_data, not if it's set on the command line.
  2. I've not been able to reproduce this with a minimal upstream repository, only with Pigweed itself.
  3. In fact I've not been able to reproduce it with other label flags in Pigweed, including label flags defined in the same BUILD.bazel file. As far as I can tell, there's only one unlucky flag that cannot be overridden!

@katre I know you have a lot on your plate, but if you have any suggestions for how to debug this, they'd be very appreciated. This one is a real head-scratcher, and also prevented me from deploying platform-based flags.

Which category does this issue belong to?

Configurability

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

https://github.com/tpudlik/cursed_flag

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 8.0.0-pre.20240618.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

katre commented 3 months ago

Looks like your device platform is setting the label_flag back to the default: is that also what you're doing with the other flag you tested (@pigweed//pw_system:device_handler_backend)?

I wonder if there's something that says "oh, this is the default, so it's not set"? I'll also try debugging, but if you can test that case it'd be handy.

katre commented 3 months ago

A better way to cquery for these things is to use filter and deps:

$ bazel-dev cquery --implicit_deps --tool_deps -- 'filter("@pigweed//pw_system:", deps(//:binary_for_device)'

This shows all the deps in @pigweed//pw_system, so you can see whether it's the one you want, rather than probing via somepath. The one restriction is that you need --implicit_deps --tool_deps to make sure nothing is filtered out.

tpudlik commented 3 months ago

Looks like your device platform is setting the label_flag back to the default: is that also what you're doing with the other flag you tested (@pigweed//pw_system:device_handler_backend)?

Indeed it's not! And if I fix that, i.e. do,

$ sed -i 's/pw_system:io_backend/pw_system:device_handler_backend/g' BUILD.bazel
$ sed -i 's/pw_system:sys_io_target_io/pw_system:unknown_device_handler/g' BUILD.bazel

I reproduce the bug with this flag, too. (I had to add build --@rules_python//python/config_settings:exec_tools_toolchain=enabled to the .bazelrc and rules_python 0.34.0 to MODULE.bazel to avoid some unrelated build failures.) So it is something about setting the flag back to its default.

katre commented 3 months ago

Good to know, that should help narrow it down.

katre commented 3 months ago

Okay, I have a very minimal test that shows the same problem, so now I can debug and fix that. I'm going to update the issue name and description to clarify the problem.

tpudlik commented 3 months ago

Ah, that's great. I tried to produce such a minimal test (having the host platform set the flag to default and the target platform override it), but couldn't get it to work. That is, my minimal repro didn't actually reproduce the bug, everything worked as intended.

katre commented 3 months ago

@tpudlik Can you build bazel from https://github.com/katre/bazel/tree/i23147-cursed-flag-fix and let me know if it addresses your reproduction?

tpudlik commented 3 months ago

Yes, I verified that Bazel built from that branch doesn't exhibit the bug, either in my minimal example or in the larger project that originally motivated it. Thank you!

katre commented 3 months ago

Excellent, I'll clean it up a bit and get it merged.

katre commented 3 months ago

Okay, got more involved after @gregestren's comment (https://github.com/bazelbuild/bazel/pull/23159#discussion_r1697576017).

Sending out a bunch of changes (unfortunately internally, not in GH), hopefully will address this properly.