bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.01k stars 4.03k forks source link

Avoid different transition output directories when build settings have the same values #14023

Open torgil opened 3 years ago

torgil commented 3 years ago

Description of the problem / feature request:

To avoid duplicated actions on dependencies when using build-settings it's necessary to reset build-settings to it's default value. This is currently not enough as we still can get conflicts on the "transition directory name fragment" for instance when two subsystems using different build-settings depend on a common library.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The below example shows the issue with two build settings (used by B and C) triggering and action conflict in a common library (ExtDep) which appears in three branches from the top target:

  1. A -> B -> ExtDep
  2. A -> B -> C -> ExtDep
  3. A -> ExtDep (issue also provoked with A -> C )

B sets a non-default value for bs2 in the input transition and resets it to default value in the edge transition to ExtDep.

Rule names below indicate which transitions they perform, eg the name "input_bs1_extdeps_bs2" indicates that it sets "bs1" build setting in the input transition and "bs2" build setting in the edge transition on the "extdeps" attribute.

rules.bzl

"Rules with different transitions to showcase analyze phase configuration conflicts"

def _dummy_rule_impl(ctx):
    return []

def _input_transition_impl(settings, attr):
    build_settings = {setting: settings[setting] for setting in attr._input_settings}
    build_settings.update(attr.input_build_settings)
    return build_settings

def _extdeps_transition_impl(settings, attr):
    build_settings = {setting: settings[setting] for setting in attr._extdeps_settings}
    build_settings.update(attr.extdeps_build_settings)
    return build_settings

def _create_custom_rule(input, extdeps):
    return rule(
        implementation = _dummy_rule_impl,
        cfg = transition(inputs = input, outputs = input, implementation = _input_transition_impl),
        attrs = {
            "deps": attr.label_list(),
            "extdeps": attr.label_list(cfg = transition(
                inputs = extdeps, outputs = extdeps, implementation = _extdeps_transition_impl)),
            "extdeps_build_settings": attr.string_dict(),
            "input_build_settings": attr.string_dict(),
            "_extdeps_settings": attr.string_list(default = extdeps),
            "_input_settings": attr.string_list(default = input),
            "_whitelist_function_transition": attr.label(
                default = "@bazel_tools//tools/whitelists/function_transition_whitelist",
            ),
        },
    )

build_setting = rule(
    implementation = _dummy_rule_impl,
    build_setting = config.string(flag = False),
)

input_outdir = _create_custom_rule(
    ["//command_line_option:output directory name"],
    [],
)

input_outdir_bs1_bs2_extdeps_bs1_bs2 = _create_custom_rule(
    ["//:bs1", "//:bs2", "//command_line_option:output directory name"],
    ["//:bs1", "//:bs2"],
)

BUILD.bazel

load(":rules.bzl", "build_setting", "input_outdir", "input_outdir_bs1_bs2_extdeps_bs1_bs2")
load("@rules_cc//cc:defs.bzl", "cc_library")

build_setting(
    name = "bs1",
    build_setting_default = "bs1_default",
)

build_setting(
    name = "bs2",
    build_setting_default = "bs2_default",
)

input_outdir(
    name = "A",
    input_build_settings = {
        "//command_line_option:output directory name": "any",
    },
    deps = [":B", ":ExtDep"],  # C instead of ExtDep triggers the same issue
)

input_outdir_bs1_bs2_extdeps_bs1_bs2(
    name = "B",
    input_build_settings = {
        "//command_line_option:output directory name": "any",
        "//:bs1": "bs1_default",
        "//:bs2": "bs2_custom",  # Input transition sets custom bs2 value
    },
    deps = [":C"],
    extdeps = [":ExtDep"],  # Edge transition sets default bs2 value
    extdeps_build_settings = {"//:bs1": "bs1_default", "//:bs2": "bs2_default"},
)

input_outdir_bs1_bs2_extdeps_bs1_bs2(
    name = "C",
    input_build_settings = {
        "//command_line_option:output directory name": "any",
        "//:bs1": "bs1_default",
        "//:bs2": "bs2_default",
    },
    extdeps = [":ExtDep"],
    extdeps_build_settings = {"//:bs1": "bs1_default", "//:bs2": "bs2_default"},
)

# Dependency with action to trigger the action conflict (any rule generating actions may apply here)
cc_library(
    name = "ExtDep",
    srcs = ["extdep.c"],
)

Example run:

$ bazel build //:A
ERROR: file '_objs/ExtDep/extdep.pic.o' is generated by these conflicting actions:
Label: //:ExtDep
RuleClass: cc_library rule
Configuration: 0ef19c895a80334be3739ffbe96bbd7f6766cbb2944f2a649b67acb81d261d43, 643272508e08a8348ad8387f39b923f0972d5a26bb916ec22e0cc03e63c97511
<...>
$ bazel config 0ef19c895a80334be3739ffbe96bbd7f6766cbb2944f2a649b67acb81d261d43 643272508e08a8348ad8387f39b923f0972d5a26bb916ec22e0cc03e63c97511
INFO: Displaying diff between configs 0ef19c895a80334be3739ffbe96bbd7f6766cbb2944f2a649b67acb81d261d43 and 643272508e08a8348ad8387f39b923f0972d5a26bb916ec22e0cc03e63c97511
Displaying diff between configs 0ef19c895a80334be3739ffbe96bbd7f6766cbb2944f2a649b67acb81d261d43 and 643272508e08a8348ad8387f39b923f0972d5a26bb916ec22e0cc03e63c97511
FragmentOptions com.google.devtools.build.lib.analysis.config.CoreOptions {
  affected by starlark transition: [//:bs2, //command_line_option:output directory name], [//command_line_option:output directory name]
  transition directory name fragment: ST-1e8dc79add0a, ST-a28aa8ec4aa8

Opts passed to transitionDirectoryNameFragment function:

ST-a28aa8ec4aa8: [//command_line_option:output directory name=any]
ST-1e8dc79add0a: [//:bs2=bs2_default, //command_line_option:output directory name=any]

What operating system are you running Bazel on?

Linux

What's the output of bazel info release?

development version

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

1. git checkout f0da3ecd186a33439b6255521e4d13a0e65ce88f  # current master
2. git revert commit af0e20f9d9808d8fba03b712491ad8851b3d5c26  # revert "Remove outdated "output directory name" option."
3. bazel-4.1.0-linux-x86_64 build //src:bazel-dev

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

not applicable

$ git remote get-url bazelbuild; git rev-parse bazelbuild/master ; git rev-parse HEAD ; git rev-parse HEAD~1
https://github.com/bazelbuild/bazel.git
f0da3ecd186a33439b6255521e4d13a0e65ce88f
b7308ccc7d763088c2efe997ac111bd9e09cec6a
f0da3ecd186a33439b6255521e4d13a0e65ce88f

Issue isolated on a custom version of v5.0.0-pre.20210604.6 but originally found in 4.x versions

Have you found anything relevant by searching the web?

Related issues: #12171 #12731

Any other information, logs, or outputs that you want to share?

Prior to #13580 it's possible to trigger the issue with only B -> ExtDeps and B -> C -> ExtDeps given that bs1 is not included in the extdeps edge transition (found with git bisect building //:B above but using input_outdir_bs1_bs2_extdeps_bs2 instead of input_outdir_bs1_bs2_extdeps_bs1_bs2)

13587 in combination with a custom output directory scheme using transition on "//command_line_option:output directory name" is designed to address all issues leading to conflicting "transition directory name fragment". This transition turns these types of issues into build failures due to action conflicts rather than silently duplicating actions in different output folders.

sdtwigg commented 2 years ago

Ok, I have a somewhat long update to this after spending a few weeks off-and-on contemplating this and related issues:

Core Background First, as core background, note that ConfiguredTargetFunction (which, when given a BuildConfigurationKey and Label performs analysis for a rule) is highly sensitive. Skyframe memoization is required for performant operation and, in the case of unsharable actions (cc_, etc.) correctness. New comments in https://github.com/bazelbuild/bazel/blob/33f7648fbaa875f73416be47f0b3c10ed93f30d6/src/main/java/com/google/devtools/build/lib/skyframe/ConfiguredTargetKey.java#L32 hopes to clarify this. Summarizing, Bazel has a key correctness constraint that, for every active Target across an entire build, the associated ConfiguredTargetKeys (created with a Label of that Target) must result in different output directories being computed during rule analysis*. Note that Skyframe infrastructure is sufficiently intelligent to ensure that, between builds, stale keys do not cause action conflicts if their output directories match active keys.

Interlude on computing the output directory name As per code in OutputDirectories.java, the output directory path name is a function of a subset of specifically-called-out fragment options (cpu, compilation mode, python version, etc.) and hash of the values of all options listed in "affected by Starlark transition".

Deeper dive on ensuring correctness of build Transitions and associated infrastructure are currently ensuring the key correctness constraint noted above is followed by either:

Interlude about "transition directory name fragment leading to violations of A. Currently, "transition directory name fragment" is part of BuildOptions. It is essentially the hashed value of all values in options referred to by "affected by Starlark transition". Excluding synthetic (i.e. should not happen in real builds) cases where a transition is itself on "transition directory name fragment", "transition directory name fragment" is itself a computed value based on all the other BuildOptions. Failure to update this when other build options change (e.g. it is discovered the exec transition fails to update this when appropriate https://github.com/bazelbuild/bazel/issues/13464) can lead to inconsistent BuildOptions. A change is already in motion to move "transition directory name fragment" out of BuildOptions and instead have it be computed as part of constructing the BuildConfigurationValue (formerly called BuildConfiguration), a collection of a BuildOptions and values all computed from that given BuildOptions). So, for the rest of this issue, assume that "transition directory name fragment" is always consistent with the rest of the BuildOptions.

Interlude about 'impossible' BuildOptions leading to violations of B. Ultimately, due to bugs, Bazel internal infrastructure can potentially create 'impossible' ConfiguredTargets; namely, analysis of a specific target done with BuildOptions that can't seemingly have been attached to a target (e.g. Target has a self-transition or trimming but they failed to apply). Failing to apply a self-transition won't necessarily cause an action conflict; however, it can lead to incorrect actions. Failing to apply trimming can lead to action conflicts because trimmed off fragments often aren't part of computing the output directory name. Avoiding 'impossible' ConfiguredTargets is not directly relevant to this issue and moreso relevant to https://github.com/bazelbuild/bazel/issues/7450. Assume for the rest of this issue that the given configuration had all the currently scheduled transitions and trimming operations run on it.

Additional constraints for more performant builds Now, while that constraint is necessary for correctness, there are additional constraints in order to maximize build performance. For every active Target across an entire BUILD, it should be analyzed as few times as necessary. If the build actions (commandline and input file contents) of two ConfiguredTargets are the same except for their output directories then they are clearly redundant. Fully solving this problem is generally understood to require generic trimming (i.e. not rebuilding a C++ library just because the python version changed) and outside the scope of this issue for now. However, we can look at avoiding redundancy when only affected by Starlark transition is different.

Further, presuming the use of action caching, there is another optimization available: avoid changing the output directory names as much as possible between incremental builds when command-line options change if those command-line options would have no other effect on the generated actions besides changing the output directory name. This leads to the following observations:

"affected by Starlark transition" is not quite right It is currently the list of all native or Starlark options changed by any previous Starlark transition. This is sufficient to ensure the output directory name changes enough after any Starlark transition. However, it is overly pessimistic:

Potential Resolutions and Related work

Addressing [3] is potentially more complex and I have two competing solutions at this time:

Appendix about the exec transmission Currently, the exec transition changes a lot of values (both those in the subset of those handled by OutputDirectories and those outside of that subset) but makes no explicit changes to the output directory name fragment or "affected by Starlark transition". In practice, it may be best for long-term correctness to treat the exec transition like a Starlark transition and audit the fromOptions and toOptions to ensure "affected by Starlark transition" is updated as appropriate.

In practice, this could be enough to rename "affected by Starlark transition" to something closer to intended like "changed options to hash" and, if doing the diff is cheap enough (or cachable), treat all native transitions like Starlark transitions and also audit their changes.

sdtwigg commented 2 years ago

As a quick update after conversation with the Bazel core team (experts on the underlying Skyframe infrastructure).

Likely to skip the first proposal to address [3] "Change "affected by Starlark transition" to instead be a map from option name to original (command-line) value."

Instead, the second proposal should be technically feasible with similar effort (and better performance). BuildConfigurationFunction will properly request/register a dependency on the top-level configuration via the Skyframe environment (likely going to injected by using the PrecomputedValue Skyframe feature).

In order for this to work correctly (and avoid unnecessary recomputation of ConfiguredTargetFunction), will need to make sure BuildConfigurationValue.equals is sensitive to the OutputDirectories object it holds (as well as the underlying BuildOptions, of course). At the moment, it looks like it relies on reference equality (and thus is overly sensitive to re-computation).

gregestren commented 2 years ago

@fmeum

gregestren commented 2 years ago

Thanks for this update, @sdtwigg .

My notes on the above:

Values:

Proposal:

gregestren commented 2 years ago

Instead, the second proposal should be technically feasible with similar effort (and better performance). BuildConfigurationFunction will properly request/register a dependency on the top-level configuration via the Skyframe environment (likely going to injected by using the PrecomputedValue Skyframe feature).

Copying some notes I took on the problem a while back: Do these match your plan?


There's another place we can set hash values: BuildConfigurationFunction This may be cleaner, in that it gives BuildConfigurationmore decisive ownership over its output paths, like in here.

In other words, we can exclusively say that a BuildConfiguration's output path is determined in BuildConfiguration construction, end of story. Something about that sounds appealing: more clear, focused ownership of a subtle concept.

BuildConfigurationFunction also has the advantage that it can easily access the build's top-level configuration, or defaults, or whatever.

Just follow the precedent of ConfiguredTargetFunction, which does this here. That code instantiates the Function with a BuildViewProvider, which Skyframe fulfills and keeps current with every build.

That includes stuff like here, which provides the top-level host configuration. It's pretty trivial to expand that to provide any other top-level context.

gregestren commented 2 years ago

Can we simply remove affected by starlark transition and replace it with BuildConfigurationFunction logic that creates transition directory name fragment?

eric-skydio commented 2 years ago

Thanks for putting in all the work to write this up, I'm excited to see this is on a path to eliminate the horrible hacks we are currently using in our codebase to make this work.

In the process of implementing a number of "unifying" starlark transitions in our codebase, one realization we've had is that it's often useful to be able to explicitly "delete" a configuration setting, meaning reset it to the command-line value. For now, we can get by with hard-coding those default values into the various transitions that need to access them, but it would eventually be nice to be able to explicitly perform a "reset" operation from a Starlark transition. It sounds like the proposed Skyframe changes would open the path to that, but I'm not familiar enough with that system to know for sure.

fmeum commented 2 years ago

Can we simply remove affected by starlark transition and replace it with BuildConfigurationFunction logic that creates transition directory name fragment?

To answer this question, I quickly prototyped this approach in https://github.com/fmeum/bazel/commit/5eefd23b0a0710e717ac09651df295c393bc5c03. This has the obvious noted deficencies and contains debug prints, but allowed me to get a better understanding of what the computed affected by starlark transition would look like. It does successfully analyze rules_go as well as our codebase at work, printing diffs such as:

Differing options:
//command_line_option:compilation_mode: opt -> fastbuild
//command_line_option:is host configuration: true -> false
//command_line_option:fat_apk_cpu: [] -> [armeabi-v7a]
//command_line_option:copt: [-g0] -> []
//command_line_option:cxxopt: [-g0] -> []
//command_line_option:strip: always -> sometimes
//command_line_option:java_runtime_version: remotejdk_11 -> local_jdk
//command_line_option:jvmopt: [-XX:ErrorFile=/dev/stderr] -> []
//go/config:static: null -> true
//go/private:request_nogo: null -> true
transitionDirectoryFragment = "ST-7ee17c9866ed"

Based on the contents of the typical diffs, I wonder whether getHostConfiguration() is the correct baseline. Is there a way to get the top-level target config rather than the host config? It could be a better fit.

Edit: This seems to be available from PrepareAnalysisPhaseValue, but I don't know the consequences of declaring a dependency on that in Skyframe.

@gregestren @sdtwigg Feel free to use (or ignore) this code under the CLA terms. If you are still worried about licensing issues, I could also submit it as a draft PR.

gregestren commented 2 years ago

Based on the contents of the typical diffs, I wonder whether getHostConfiguration() is the correct baseline. Is there a way to get the top-level target config rather than the host config? It could be a better fit.

Yes, there is a way. And I agree the top-level config makes a better baseline.

Since @sdtwigg has plans along these lines I'd leave it to you two to converge.

sdtwigg commented 2 years ago

As another piece of work (to be done after removal of "affected by Starlark transition" so tjat the results native transitions are also being properly diffed against the top-level configuration), we can then remove the ad hoc output directories distinguishers used by AppleConfiguration and AndroidConfiguration in their configurationDistinguisher option. (Morally, the configurationDistinguisher is very similar to "affected by Starlark transition" and "transition directory name fragment" in that they are wholly functions of the current configuration and its diff to other configurations. Thus, their function should be wholly subsumed by the more generic calculation of ST-hash.)

fmeum commented 2 years ago

@sdtwigg @gregestren Could you provide a brief update on the efforts to make affected by starlark transition a pure function? Do you still expect to implement that this year? If there should be parts that could be done independently, I would be happy to help.

gregestren commented 2 years ago

@sdtwigg has some pending code at https://bazel-review.googlesource.com/c/bazel/+/181730 that I'm not aware of anything blocking it from merging. I don't know his ETA for the necessary followups.

I'm taking a holiday break through the new year but I'll make sure to follow up after Jan 4.

sdtwigg commented 2 years ago

As a quick update, work continues on this. I have pending work, which you can peek at by going to https://bazel-review.googlesource.com/c/bazel/+/189290. The simple unit tests and preliminary tests all work; however, integration tests still show potential issues (mostly revolving around ST-hashes now appearing in previous unexpected places). The inability to set EXPLICIT_IN_OUTPUT_PATH for --platforms also seems to have come back to haunt me here.

If you are feeling adventurous, feel free to look and even patch that CL and try it. I need to decide when to actually submit that CL and then just do follow-up changes (as all the new behaviors are guarded behind an experimental flag anyway).

gregestren commented 2 years ago

I agree: the --experimental prefix gives more latitude to submit, then do follow ups. And makes it easier for everyone interested to play around with it.

gregestren commented 2 years ago

FYI @sdtwigg is currently exploring the implications of which config is the "baseline" for the new algorithm.

To recap:

@sdtwigg is writing up more details. I'm curious if anyone here prefers one vs. the other, and for what reasons.

fmeum commented 2 years ago

It seems that at least some of the potential downsides of the "diff against options' default values" approach stem from the fact that the default output directory does not include an ST hash. This can probably be solved with special handling to ensure that transitions can return to that output directory and/or a somewhat simpler application of path stripping only applied to the top-level configuration.

I'm looking forward to the document, which should provide a good base for us to determine what's possible and what's (currently) out of reach.

gregestren commented 2 years ago

To be clear, I believe https://bazel-review.googlesource.com/c/bazel/+/189290 will be submitted fairly soon, using the "diff from the build's top-level settings" approach. So we can all get experience even if conversation about the right baseline continues.

It's true that lack of hashes in the top-level directory is a weird complication. Ideally it shouldn't matter if a configuration is "top-level", "not-top-level", "part of build A", "part of build B", or anything. But there are these distinctions. And the "no hash" variant is a pretty deep one which has a big impact on remote execution since every build anyone ever does has a top-level config (and usually most of its graph is in that config).

fmeum commented 2 years ago

@sdtwigg Thanks again for this great improvement to transition efficiency. We've been using the new diff_against_baseline mode for some time now and have found it to be both very reliable and quite efficient at reducing the number of distinct configurations.

Do you already have any plans on how to stabilize this feature? At least based on my experience with it so far, it seems clearly superior to the legacy strategy. Are there any backwards compatibility concerns associated with it?

sdtwigg commented 2 years ago

My current plan is to deploy the feature internally here (all signs pointing to this going fine) then make diff_against_baseline the default. After that, can start the flag deprecation process so it is the permanent behavior.

There was actually hard-to-concisely-explain-but-also-extremely-complex internal blocker on the feature (essentially related to BuildConfigurationFunction now registering a dependency on the floating top-level configuration). Fortunately this has been thoroughly dealt with (although was a major reason for the implementation delay).

sdtwigg commented 2 years ago

PS: I am glad to hear that this is working well for you all externally. Please feel free to update here with any issues or feedback.

konste commented 2 years ago

Legacy transition behavior looks like a bug to me. It doesn't make sense.

@sdtwigg I am so glad you put the effort to figure it out and rework it to make it much more sensible.

My team moved our huge project from Bazel 5.2 to pre-release of Bazel 6 solely because of this major improvement.

BTW don't know if it is intended or not, but diff_against_baseline mode seems to be the default in the current pre-release of Bazel 6. Fine with us, but feels a little unexpected.

konste commented 1 year ago

We use --experimental_output_directory_naming_scheme=diff_against_baseline and noticed that it works well for Bazel built-in (or just well known) platforms, but does not work for custom platforms. For example we have very simple custom platform for Emscripten:

constraint_value(
    name = "asmjs",
    constraint_setting = "@platforms//cpu",
)

platform(
    name = "emscripten", 
    constraint_values = [
        "@tab_toolchains//bazel/platforms:asmjs",
    ],
)

When we build for that platform directly output goes to asmjs-opt folder, but when we arrive at exactly the same configuration through transition output goes to asmjs-opt-ST-b23356f6edc4 Is it a bug or we are missing something?

I tried bazel config --dump_all to check if transition somehow gives us not exactly the same configuration, but no - we only have exactly one asmj configuration in the dump. Help?

sdtwigg commented 1 year ago

Having an ST hash with platforms changes is a known behavior (as simply removing the hash was causing action conflicts in rare, but predictable circumstances). There are some ideas for a more intricate changes that would obviate the need for the hash here.

Were you seeing correctness or performance issues because of the hash in this case? Or, is this mostly an ergonomics concern?

konste commented 1 year ago

I don't understand how come diff_against_baseline works without appending a hash for some platforms and causes hash appended for other platforms.

It is not about ergonomics, it is really about build performance. Here is the scenario on the high level. We have a multiplatform build for the target platforms A, B, C. Many targets are built for the platform A, but they are also reachable through the chain of transitions which in the end is exact equivalent of A. If the name of the output folder differs in those cases we get the same targets built for the same platform A multiple times!

torgil commented 1 year ago

@konste We have the exact same multi-platform scenario with custom platforms. As discussed under #14236 we use "output directory name" transition to resolve this situation (in combination with a few other patches referenced in that thread).

konste commented 1 year ago

@torgil thank you for the link! Complexity of those interconnected problems is mind bending! Unfortunately using custom built Bazel is out of the question for us. Even using rolling builds was only accepted as temporary. I understand you also deal with C++, so in case "tree artifacts" get in the picture be aware of this conected bug: https://github.com/bazelbuild/bazel/issues/15770.

torgil commented 1 year ago

@konste Thank you for the link. "output directory name" option was removed in 5.0.0-pre.20210929.1 so if you can live with an earlier version you should be good. You can also set "platform_suffix" to "/../output_directory_name" as a workaround. Also, watch out for the "Remove AggregatingMiddleman from toolchains." patch that entered 5.0.0-pre.20210708.4, that can have performance regressions on the execution time (wasn't able to easily reproduce that in an example though).

fmeum commented 1 year ago

I just talked to @matloob about how we can support Go 1.20 (due February 2023) in rules_go. In order to get non-pathological caching behavior, we would need to get rid of affected by starlark transition - otherwise we would end up rebuilding the Go stdlib, which with Go 1.20 no longer ships in precompiled form, many times.

@sdtwigg I talked to @gregestren about the reason behind --experimental_output_directory_naming_scheme=diff_against_baseline not being the default yet. Do you think that the results at Google would generalize to Bazel users or could we change the default in Bazel, but not Blaze? If that could regress performance for Bazel users in some common scenarios, would you expect diffing against the global defaults rather than the top-level defaults to fare better?

Since this kind of change could probably only be made in a major release of Bazel, it would be very helpful if we could arrive at an informed decision before the Bazel 6 release.

gregestren commented 1 year ago

Not obvious to me at all that defaulting --experimental_output_directory_naming_scheme=diff_against_baseline in Bazel is unsafe. I think most orgs don't suffer from resource constraints in their remote clusters as the dominant variable.

I'm very interested in global defaults but I don't expect algorithmic changes to be realistic in a pressured time frame.

Even if --experimental_output_directory_naming_scheme=diff_against_baseline, which I suspect is plausible, we need to be careful not to throw a potential last-minute monkey wrench into Bazel 6.0 stability.

fmeum commented 1 year ago

@gregestren Would you consider flipping this flag in a minor release (e.g. Bazel 6.1.0) to be an option (after a number of rolling releases with it flipped)? I know that rules_nodejs setups sometimes enumerate particular values of the ST hash, but that hash can change for a number of unrelated reasons.

gregestren commented 1 year ago

I don't think the release policy would allow that because it's not backward-compatible? We'd have to check that with the release manager.

I'll make sure we chat about this next week (note a US holiday's putting a bunch of people offline rest of this week) and follow up. We definitely need to lean on @sdtwigg 's intuition.

fmeum commented 1 year ago

Thanks, that is very much appreciated.

I don't think the release policy would allow that because it's not backward-compatible? We'd have to check that with the release manager.

If it is considered not backwards-compatible, then that would definitely not be possible. I was wondering whether it would have to be considered breaking though as both affected by starlark transition and the precise value of the ST are probably not part of Bazel's promised API surface.

I can add that we have been using this flag internally since it first landed in Bazel and never had any issues with it, but of course our rules_go-related build times reduced noticeably.

konste commented 1 year ago

I can confirm that we rely on this flag for a huge C++ build and it works without a hitch. This flag is our primary reason for Bazel 6.0 adoption.

gregestren commented 1 year ago

@konste - reading https://github.com/bazelbuild/bazel/issues/14023#issuecomment-1266241503 again, it's not clear to me that the behavior you describe is expected vs. a bug.

In that comment, are you saying that when you run $ bazel build //foo --platforms=//:emscripten, then top-level targets build to bazel-out/asmjs-opt, but other targets that match that same exact platform under a transition look like asmjs-opt-ST-b23356f6edc4? That seems unnecessary to me.

Or are you saying the difference is across different builds, where --platforms is set differently on each one?

Re: https://github.com/bazelbuild/bazel/issues/14023#issuecomment-1325751008 -we didn't get to it (considering Bazel 6.x) at today's sync. Next chance is Friday.

konste commented 1 year ago

@gregestren I am working on a standalone repro to demo the problem.

konste commented 1 year ago

I took a stab at the repro and here is what I got: https://github.com/Bazel-snippets/14023 Command to run: bazel build demo --platforms=//platforms:platform_1 Honestly I am not sure what I see - it looks like a different bug!

target_1 is not transitioned and happily outputs to the folder cpu_1-fastbuild

target_2 is transitioned to platform_2 and outputs to the folder cpu_2-fastbuild-ST-a63e98ce9886 This is not great - I'd like to see just cpu_1-fastbuild

target_3 transitions to platform_2 and then back to platform_1. Prints from transition function show that it is indeed transition there and back, BUT the target itself still reports that it is executed for the platform_2 and not for the platform_1 as expected.

Can you make sense of it?

gregestren commented 1 year ago

All - we support flipping this for Bazel 6.0.

Would someone be willing to put up a patch for the release managers? I haven't talked with the release managers - they may have their own opinion. But I'm happy to argue this doesn't count as a breaking change.

fmeum commented 1 year ago

All - we support flipping this for Bazel 6.0.

Would someone be willing to put up a patch for the release managers? I haven't talked with the release managers - they may have their own opinion. But I'm happy to argue this doesn't count as a breaking change.

I will create a patch and ask the release managers.

@gregestren When you say "But I'm happy to argue this doesn't count as a breaking change", do you think that this could also get into a 6.1.0 release? If so, that would probably be preferable to a cherry pick into 6.0.0 at this point.

gregestren commented 1 year ago

Sure, 6.1.0's fine to me. I'm just saying I don't think we need to delay until 7.0.

torgil commented 1 year ago

Nice. It looks like diff_against_baseline is a solution to my example above. Good job!

@konste

target_2 is transitioned to platform_2 and outputs to the folder cpu_2-fastbuild-ST-a63e98ce9886 This is not great - I'd like to see just cpu_1-fastbuild

It looks correct to me since target2 is built with platform2 for the demo target.

target_3 transitions to platform_2 and then back to platform_1. Prints from transition function show that it is indeed transition there and back, BUT the target itself still reports that it is executed for the platform_2 and not for the platform_1 as expected.

Can you make sense of it?

Yes. Since your code first transitions to platform1 and then platform 2, you'll end up with platform 2. If you change the order like this ...

transitioner_rule(
    name = "target_3_to_platform_1",
    deps = "target_3",
    transition_to_platform = "//platforms:platform_1"
)

transitioner_rule(
    name = "target_3_back_to_platform_1",
    deps = "target_3_to_platform_1",
    transition_to_platform = "//platforms:platform_2"
)

... you'll get back to cpu_1-fastbuild. Works for me with master branch as of Dec 8 (f82fc5b05f8e086c26b82e29cd91008a9b5af8c0)

.../execroot/main/bazel-out/cpu_1-fastbuild/bin/manifest_target_1 .../execroot/main/bazel-out/cpu_2-fastbuild-ST-a63e98ce9886/bin/manifest_target_2 .../execroot/main/bazel-out/cpu_1-fastbuild/bin/manifest_target_3

konste commented 1 year ago

Thanks a lot @torgil !! I got it now. I was looking at the order of the targets in the BUILD file and forgot that transitions applied backwards starting from the target requested.

konste commented 1 year ago

But can somebody explain why transition to another platform produces output folder with ST- suffix instead of the name without the suffix we get if the same platform is requested from the top? For example:

fmeum commented 1 year ago

That's because the diff_against_baseline mode computes the ST part as a hash of a diff between the current target and the top-level configuration. In particular, the hash will always be present if your current platform differs from the one specified on the command-line.

@sdtwigg Could you give an example where EXPLICIT_IN_OUTPUT_PATH broke --platforms?

konste commented 1 year ago

@fmeum But at least the hash is stable so no matter which transition path bring us to the particular configuration they all end up in the same output folder, right?

I wonder what could possibly be done to fix the difference in the output folder name between the case when configuration is the top one or transitioned to? Without some workaround we still get double building in this case.

fmeum commented 1 year ago

@konste I think that Stephen has been working on this. Theoretically speaking, --platforms wouldn't need to be part of the hash as it is already represented in the human-readable part of the output path. But realizing this seemed to be difficult (see https://github.com/bazelbuild/bazel/issues/14023#issuecomment-1268581256).