bazel-contrib / bazel-gazelle

Gazelle is a Bazel build file generator for Bazel projects. It natively supports Go and protobuf, and it may be extended to support new languages and custom rule sets.
Apache License 2.0
1.19k stars 378 forks source link

Proposal: Add optional gazelle `ConventionChecker` interface to avoid indexing. #1907

Open tyler-french opened 1 month ago

tyler-french commented 1 month ago

See implementation: https://github.com/bazelbuild/bazel-gazelle/pull/1900

Why Gazelle's index doesn't scale

Currently, gazelle fix-update runs with a configurable option of an --index. Certain extensions can use the index to properly resolve imports in a repo. The ruleIndex is essentially a large structure containing all of the rules that have an associated Resolver.

If the index is enabled, the rule index must load and store information from the entire repository. If disabled, gazelle can rely (in most cases) on the # gazelle:prefix directive or -go_prefix flag to resolve dependencies.

There are two reasons this does not scale:

Running gazelle in a repository should almost always be doable by running on just the files changed since a recent run. If a user is only modifying a small project, their gazelle invocation should only use and care about that part of the repository. Otherwise, the incrementally of gazelle is not there.

Gazelle Convention Checker

At Uber, we've solved this problem using a gazelle ConventionChecker interface, which is patched onto our gazelle binary. This convention checker works in the following manner.

A language extension can implement a Convention interface.

// Convention should be implemented by langauge extensions in order to
// register language specific convention logic with the convention checker.
type Convention interface {
    // CheckConvention returns whether or not the rule information follows
    // a known convention.
    CheckConvention(c *config.Config, kind, imp, name, rel string) bool
}

If an import path follows the conventions expected by the Resolver of the interface, then the convention checker will return true. Otherwise, the ConventionChecker will add to list of rules which violate the convention.

Then, on Finish(, the ConventionChecker will write any rules that violate the convention as # gazelle:resolve directives in the top-level BUILD.bazel file. The "automatic resolves" are added to the top-level file to speed up lookup, and are added in a special location to avoid confusion with other parts of this file. They are added under a header "### AUTOMATIC RESOLVES ###" at the bottom of the file.

Optionally, these could be added somewhere else from some other file passed at runtime, but since they are technically part of the build configuration, using the BUILD.bazel file makes the most sense.

Why do this?

This has two clear benefits:

The convention checker can also be enabled/disabled with a flag, so this optimization is optional for all users, and can be avoided if desired.

alex-torok commented 1 month ago

This was mentioned in my proposal for lazy indexing (#1891) as an alternative method. I think that this conventions system offers a better way to get faster incremental results, as it allows repos to more easily implement their own custom plugins to form their own conventions. The "automatic resolves" act as a versioned partial index, which solves the cache consistency issue that #1181 faces. Nice idea here!

I have a few questions:

  1. How do you keep the automatic resolves up-to-date? I'm guessing that a CI job runs gazelle with --use_conventions and asserts no changes are made to the index?
    • It seems like this would work well, as any change to a specific directory would assert that the automatic resolves for that directory are updated in the root-level "automatic resolves" section. Have there been any issues with merge conflicts?
  2. Can you provide some example descriptions of conventions you've implemented and found use for? The Convention interface is somewhat abstract from what an actual import->label convention is, so having an example or two would help.
    • Specifically, I'm curious about what conventions you use for python. The problem of mapping import statements to target labels without full indexing seems like it would require enforcing a strict style for import statements or considering all python targets as not adhering to any conventions. The latter essentially would make the conventions index have every python target in it.