Closed tetromino closed 1 month ago
This seems like a great idea, but more commonly than not, these "finalizers" don't actually run at package level granularity. They run on whole trees of files.
For example, I've previously worked on a project where we required every build file to have a conformance_checks()
rule at the bottom of a file. The problem was, this was kind of a pain to put at the bottom of every build file - It was very easily forgotten, and very frequently would result in having to re-upload several minutes later after your presubmits failed.
I'd propose that finalizers may optionally be transitive. That is, if //foo:my_finalizer
is a transitive finalizer, native.existing_rules(transitive=True)
would return //foo/bar/baz:*
in addition to //foo:*
. This means that you could have a single finalizer for your whole project, instead of having to specify it in every single build file.
The only concern I have with such an approach would be the potential longer loading phase time increase when loading top-level build files, but I think that in practice, most of the time when you load top-level build files, they depend on the whole tree under them anyway.
Also, side question: How willl finalizers interact with export_files
and non-exported files?
@matts1 - the idea is to run finalizers in the loading phase. Loading phase means each package is in a separate Starlark (and Java) thread, and the packages cannot see each others' content. See https://bazel.build/extending/concepts#evaluation-model
But what we could do instead is register default finalizers.
package_group
-like specification (for example, you may only want the android team's finalizers under the directory tree they own, you wouldn't want any default finalizers at all in //experimental, etc.)This default finalizer registration a part of my proposal internally, but we decided to shelve it for now because of worries about performance.
That sounds like an interesting idea.
What were the specific performance concerns? Was it hundreds of matchers which you have to match every package against being a problem in a large monorepo?
I would have imagined performance concerns could be mitigated with sub-matchers, but maybe I'm misunderstanding the problem
@matts1 - the performance problem is that in a monorepo, a change to REPO.bazel or WORKSPACE invalidates all loading/analysis caches company-wide, including the CI/CD systems :)
For the non-monorepo case, default registration could probably be useful - but were weren't sure how useful, how many people would be interested.
Could that potentially be solved by moving the finalizers outside of REPO.bazel, with REPO.bazel only referencing them? This seems like a nice solution, but I don't know how feasible interleaving the loading and analysis phase would be, which might be required for this. That being said, my design doc regular rules in module extensions was approved, and does require solving that problem.
# REPO.bazel
register_default_finalizers("//:default_finalizers")
# //BUILD
default_finalizer_group(
name = "default_finalizers",
finalizers = [
"//foo/bar:default_finalizer",
]
)
# //foo/bar/BUILD
finalizer_rule(
name = "default_finalizer",
default_finalizer = True,
...
)
@matts1 - if package //foo can theoretically get a default finalizer registered in by a specification defined in \<bar>, then a change to bar must invalidate the loading and analysis cache for //foo. It does not matter how \<bar> is structured - whether it's a single file or a group of files, whether it's REPO.bazel or some .bzl file referenced by REPO.bazel, or whatever other mechanism. The cache will be invalidated. And if all packages in a repo can theoretically get a default finalizer registered in them by a specification defined in \<bar>, then any change to bar will invalidate all caches for that repo.
The solution for big monorepos might be along the lines of sharding the repo into components or projects which have independent config files. It will take some designing.
Ah, didn't realize that. Adding to your sharding idea, we could do something like the following, where we only allow a finalizer to come from parent packages:
# REPO.bazel
register_default_finalizers("//:default_finalizers")
# //BUILD
default_finalizer_group(
name = "default_finalizers",
finalizers = [
"//foo:default_finalizers",
# We *may* want to disallow this, to ensure that only changes to the direct parent c
]
)
# //foo/BUILD
default_finalizer_group(
name = "default_finalizers",
finalizers = [
"//foo/bar:default_finalizers",
# This would be disallowed because //baz is not a subpackage of //foo.
"//baz:default_finalizers",
]
)
# //foo/bar/BUILD
finalizer_rule(
name = "default_finalizer",
default_finalizer = True,
...
)
Though thinking about it now, this seems very similar to my original proposal, with the only real distinction being that if you don't configure finalizers, the parent BUILD files don't invalidate the child ones.
I might be misunderstanding how things work, but if, for example, you had a global configuration file:
register_default_finalizer("//foo/bar/...", "foobar_finalizer")
register_default_finalizer("//foo/...", "foo_finalizer")
Then it could evaluate to:
{
SkyKey("all_finalizers"): SkyValue(["//foo", "//foobar"])
SkyKey("default_finalizer://foo/bar"): SkyValue("foobar_finalizer"),
SkyKey("default_finalizer://foo"): SkyValue("foo_finalizer")
}
Then the SkyFunction for the package "//foo/bar/baz" would take in the SkyValues for the default finalizers for ["//", "//foo", "//foo/bar", //foo/bar/baz"]
. This seems like it'd prevent a cache invalidation without needing to shard, since the SkyValues for those things wouldn't change unless they'd actually been changed in the config file, instead of if they could have theoretically been changed.
That being said, this might need a bit of a rework to skyframe itself to allow you to map SkyKey(default_finalizer://unknown_key)
to SkyValue(None)
by checking if it's contained within "all_finalizers". I don't have enough experience with skyframe to know whether this proposal is viable or not.
@tetromino, should this bug be closed now that finalizers (as we scoped them for Symbolic Macros) is implemented and set for release with 8.0? Or is there ongoing design work or extensions you want to use this bug for?
@brandjon - thanks for the reminder!
We wish to add rule finalizers - special symbolic macros (see #19922) which - regardless of lexical position in a BUILD file - are evaluated in the final stage of loading a package, after all non-finalizer targets have been defined. Unlike ordinary symbolic macros, a finalizer can call
native.existing_rules()
, where it behaves slightly differently than in legacy macros: it only returns the set of non-finalizer rule targets. The finalizer may assert on the state of that set or define new targets.Evaluating a rule finalizer necessarily requires loading the entire package (except for any other finalizers), preventing lazy symbolic macro evaluation. To maximize the benefit of lazy evaluation, we will need a flag to control whether to force finalizers to be evaluated.
Motivation
Many teams at Google have project- or framework-specific "epilogue macros", which must be invoked at the end of a BUILD file, and which use
native.existing_rules()
to do some of the following:I suspect that the same holds in other sufficiently large Bazel (mono)repos. Even rule set repos sometimes declare epilogue macros - for example, flags.FLAGS() in rules_android.
Use of epilogue macros has the following drawbacks:
native.existing_rules()
with its current semantics, they make it fundamentally impossible to lazy-evaluate the package, even if a user is not interested in running any of the epilogue's tests.Concrete proposal
Consider a theoretical finalizer:
The
native.existing_rules()
call in the context of a finalizer will always return a deeply immutable map which contains all non-finalizer-defined rule targets, and does not contain any targets declared in any finalizer (including the currently running one!), ensuring that (a) we can cache the existing_rules map and reuse it for multiple finalizers and (b) finalizer order does not matter.Implementation in Bazel
A finalizer is a special kind of macro, which can be implemented as a subclass of MacroFunction or as a MacroFunction with a special flag. In particular, lazy/optional evaluation of finalizers (when allowed by a Bazel flag) is taken care of by lazy evaluation of macros.
Package.Builder already has some support for multi-stage building (see buildPartial()); this proposal would add a finalizer stage, the first step of which (if any finalizers are found in the package) would be to save a snapshot of the existing_rules snapshottable bimap.
Interaction with symbolic macros
Finalizers may not be called in a symbolic macro; but finalizers may call symbolic macros and other finalizers (and, of course, legacy macros).
The existing_rules map accessible to a finalizer contains targets regardless of their visibility or which macro they were declared in (note that symbolic macros may declare a target which is not visible to the package where it lives) - a finalizer might assert on a rule target's attributes rather than use it as a dependency, and hiding targets from the finalizer when they are visible to a genquery would be an odd choice.
Lazy evaluation
To evaluate a finalizer, Bazel must load the complete set of the package's non-finalizer targets - we don't want the finalizer's behavior to vary unpredictably depending on which part of the package was evaluated. In this, a finalizer is no different from an epilogue macro. But the advantage of the finalizer is that it's self-contained, intentional, isolated from other finalizers, and there is a way to tell that we don't need to evaluate it:
if we don't need need any targets declared under the finalizer's namespace if a user doesn't care about running finalizer checks (as expressed by a Bazel flag)
In situations where a repo's policies need to be enforced - for example, in CI/CD - Bazel would need to run with a flag that forces evaluation of all macros and finalizers in the package.
By contrast, in situations where users need low latency of loading - for example, in an IDE when opening a BUILD file - one would certainly want to make evaluation lazy (and therefore not evaluate finalizers) unless a target under the finalizer's namespace was requested.