Open tyler-french opened 2 months ago
This was mentioned in my proposal for lazy indexing (#1891) as an alternative method. I think that this conventions system offers a better way to get faster incremental results, as it allows repos to more easily implement their own custom plugins to form their own conventions. The "automatic resolves" act as a versioned partial index, which solves the cache consistency issue that #1181 faces. Nice idea here!
I have a few questions:
--use_conventions
and asserts no changes are made to the index?
Convention
interface is somewhat abstract from what an actual import->label convention is, so having an example or two would help.
import
statements or considering all python targets as not adhering to any conventions. The latter essentially would make the conventions index have every python target in it.
See implementation: https://github.com/bazelbuild/bazel-gazelle/pull/1900
Why Gazelle's index doesn't scale
Currently,
gazelle fix-update
runs with a configurable option of an--index
. Certain extensions can use the index to properly resolve imports in a repo. TheruleIndex
is essentially a large structure containing all of the rules that have an associated Resolver.If the index is enabled, the rule index must load and store information from the entire repository. If disabled,
gazelle
can rely (in most cases) on the # gazelle:prefix directive or -go_prefix flag to resolve dependencies.There are two reasons this does not scale:
index=true
every gazelle run must rely onO(size of repo)
rather thanO(size of changes)
to run because it must have a fully-populated index.index=false
any exceptions to thego_prefix
or# gazelle:prefix
will require manual# gazelle:resolve
statements added, which almost removes the purpose of automatic build file generation. Also, not all plugins or gazelle language extensions might work without an index.Running
gazelle
in a repository should almost always be doable by running on just the files changed since a recent run. If a user is only modifying a small project, their gazelle invocation should only use and care about that part of the repository. Otherwise, the incrementally ofgazelle
is not there.Gazelle Convention Checker
At Uber, we've solved this problem using a gazelle
ConventionChecker
interface, which is patched onto ourgazelle
binary. This convention checker works in the following manner.A language extension can implement a
Convention
interface.If an import path follows the conventions expected by the
Resolver
of the interface, then the convention checker will returntrue
. Otherwise, theConventionChecker
will add to list of rules which violate the convention.Then, on
Finish(
, theConventionChecker
will write any rules that violate the convention as# gazelle:resolve
directives in the top-level BUILD.bazel file. The "automatic resolves" are added to the top-level file to speed up lookup, and are added in a special location to avoid confusion with other parts of this file. They are added under a header"### AUTOMATIC RESOLVES ###"
at the bottom of the file.Optionally, these could be added somewhere else from some other file passed at runtime, but since they are technically part of the build configuration, using the BUILD.bazel file makes the most sense.
Why do this?
This has two clear benefits:
# gazelle:resolve
statements for convention violations. There can be 10000s of convention violations, but they will still be loaded much faster than walking 10000s of directories to load an index. Once they are loaded once, they exist in memory in a O(1) lookup map, so they can be easily retrieved.The convention checker can also be enabled/disabled with a flag, so this optimization is optional for all users, and can be avoided if desired.