bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.99k stars 4.03k forks source link

Add a flag to always add st-hashes diffed against some static configuration #19337

Open qc-benjamin opened 1 year ago

qc-benjamin commented 1 year ago

Description of the feature request:

Add a naming schema to --output_directory_naming_schema, diff_against_null that gives a known st-hash no matter which baseline configuration was used.

Which category does this issue belong to?

Configurability

What underlying problem are you trying to solve with this feature?

The current default value diff_against_baseline is great but has less than ideal behaviour for incremental builds in a situation where you have transitions in a build which gets invoked with multiple different configurations.

By using a diff_against_null naming schema which always gives a known st-hash no matter your baseline invocation a developer can transition into the same configuration from multiple entry points without having to perform rebuilds of actions that was already completed.

Consider a situation where you have main and lib, where lib is built with its own completely standalone configuration:

# BUILD
load("//:rules.bzl", "lib_rule")

genrule(
    name = "main",
    srcs = [":lib"],
    outs = ["main.out"],
    cmd = "echo 'Main depends on $(location :lib)' > $@",
)

lib_rule(name = "lib")

# rules.bzl
def _my_transition_impl(settings, attr):
    return {"//command_line_option:features": ["x"], "//command_line_option:host_features": ["y"]}

my_transition = transition(
    implementation = _my_transition_impl,
    inputs = [],
    outputs = ["//command_line_option:features", "//command_line_option:host_features"]
)

def _lib_impl(ctx):
    f = ctx.actions.declare_file(ctx.label.name)
    ctx.actions.write(f, "lib content")
    return [DefaultInfo(files = depset([f]))]

lib_rule = rule(
    implementation = _lib_impl,
    cfg = my_transition, 
    attrs = {"_allowlist_function_transition": attr.label(default = "@bazel_tools//tools/allowlists/function_transition_allowlist")},
)

Should you build main with bazel build :main:

$ bazel build :main && cat bazel-bin/main.out
Main depends on bazel-out/k8-fastbuild-ST-84310ec50020/bin/lib
$ bazel cquery 'deps(:main,1)'
//:main (eac6935)
//:lib (c4aca37)

And with bazel build :main --features=x:

$ bazel build :main --features=x && cat bazel-bin/main.out
Main depends on bazel-out/k8-fastbuild-ST-e13a06a5f591/bin/lib
$ bazel cquery --features=x 'deps(:main,1)'
//:main (18a980e)
//:lib (c4aca37)

As you can see lib will then be put in a different folder with another hash, forcing a rebuild of lib even though it is being run with an identical configurations (c4aca37).

For a transition heavy codebase, where most work involves some transition, diff_against_null has no downsides. However, for builds which use little to no transitions you lose the implicit optimization of configuration independent actions being cache hits when no transition has been performed. This is lost since tacking on an st-hash to the output would invalidate an otherwise identical action.

Notes

Attentive readers might notice that this is a request for a hash of the absolute value of the configuration rather than a diff. diff_against_null is simply a name I chose to contrast it with diff_against_baseline. I haven't explored all the edgecases but using the configuration id as suffix might be sufficient

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 7.0.0-pre.20230810.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

fmeum commented 1 year ago

Cc @sdtwigg

sdtwigg commented 1 year ago

I had considered this in the past; however, it has horrible performance for incremental builds where flags are changing. In particular; here, if you changed a flag that is not touched by the transitions (i.e. not features or host_features) then the whole build will also be redone.

If your whole codebase is riddled with many transitions then I think you will fall into "transition touches options not touched by commandline" more often and thus see more builds. That said, it is not especially expensive to add the option for testing purposes but I am really hesitant that it will really solve the problem. I will bring this up with the team but am curious as to what others think as well.

One thing that gregce and I have discussed (in varying terms and somewhat independently) is the concept of named transitions/configurations. Basically, it would be a generalization of diff_against_dynamic_baseline so that other Starlark transitions could register themselves to have a special name as well.

The design is not fully formed yet as there are a bunch of edge cases (e.g. if two named transitions are taken or a named transition after the exec transition, and so on) as well as the exact mechanism to do these 'registrations' (WORKSPACE file? the transition declaration itself?).