Feature request: --show-rulekey

linzhp commented 5 years ago

Description of the problem / feature request:

Buck has the concept of rule keys, which we can obtain by running buck targets --show-rulekey //.... We'd love to see similar feature in Bazel.

Feature requests: what underlying problem are you trying to solve with this feature?

We need to see what targets are changed, thus need to rebuilt and test, from one revision to another. In order to keep master green at scale, we need to build and test diffs in parallel. Getting the rule key for each target help us decide what diffs are independent and safe to build and test in parallel.

vitarb commented 5 years ago

To give a little more context, this feature would be extremely useful and will help to produce a list of changed targets between two commits on CI allowing correctly scoped builds (instead of //...) and also allowing other pre-land optimizations. As previous discussion suggests there is no easy way to detect changes in the build graph when it comes to modifications in starlark files (WORKSPACE, BUILD, *.bzl, etc) Although it's possible to implement this logic outside of bazel, it would be prone to either false positives or false negatives and can also be rather slow.

For example I've built reasonably working prototype that:

Runs bazel query deps(//...), parses it's output and builds in-memory representation of the dependency graph.
Finds a list of changed files (using git) and for each: a) for source files - finds all reverse dependencies in the build graph (easy case). b) for starlark files - parses the file using bazel's starlark parser at both revisions, hashes all expressions and finds which ones changed.
Translates each changed call expression or assignment into bazel path that it can be referenced from (e.g. //src/foo:foo for BUILD files) and (@foo for WORKSPACE file) and find all reverse dependencies of that marking them as changed.
Native rules are just assignments in the starlark file and are not represented as rule-inputs in the build graph so they require special logic to handle.

Although it works fast and requires only one bazel query it may lead to possible false negatives:

It might be not able to handle some edge cases when there is no connection between starlark definitions and dependency graph (for example if you change go version defined in the go toolchain it won't detect any changes as toolchain info is not present in the dependency graph, work-around for that would be extracting go version into a variable and using it as a tag in the io_bazel_rules_go which comes as a dependency for all go rules)
Identifying if given starlark rule has changed requires handling various scenarios in the starlark syntax such as macros, assignments and call statements. It's relatively easy to miss some edge case.

We've also considered another approach, which relies on query of rbuildfiles for each changed starlark file but what makes it not acceptable is that any change to the WORKSPACE file would trigger pretty much full repo rebuild as all third-party dependencies are defined there. In addition to false positives it also has scalability issues as it needs to issues a separate query for each changed starlark file which runs ~0.5 sec, meaning that reasonably sized refactoring affecting 1000s of build files can be computing changed targets for 10s of minutes.

Are there any reasons preventing us from adding this feature into bazel core and printing checksum next to each rule in the dependency graph?

Globegitter commented 5 years ago

I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases.

vitarb commented 5 years ago

I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases.

@Globegitter, let me refine it a little bit and I will try to share something. Meanwhile one my colleagues suggested alternative implementation that computes target hashes by hashing workspace source files and bazel's query output which has all rule parameters in the proto format. This approach currently covers all major scenarios including changes to files in the repo, WORKSPACE changes, BUILD file changes and changes to native rules/starlark files. What I like about this approach is that it doesn't require processing starlark files and evaluating hashes for them, which is the most complex and error prone part of the solution suggested above.

On the other hand downsides are:

It requires at least two queries (on old and new revision)
Requires hashing of the entire source tree twice.
Changes to non-starlark rules can not be detected easily.

Since most of these issues can be either mitigated by proper cashing or other improvements it seems to be a pretty good approach overall and can lead to a concise and VCS independent solution.

At the same time having bazel do same would probably be more efficient.

lberki commented 5 years ago

Bouncing over to the Core people. However, given that rule keys are not how Bazel works (analysis phase caching relies on Java object identity alone and it's only actions that have hashes, which are a function of the contents of their input files), I wouldn't hold my breath.

purkhusid commented 5 years ago

@vitarb You don't happen to have anything that you can share? I've been looking for something that solves this problem but I've found no concrete solutions. Pants has a flag --changed-parent=<git hash here> that takes care of this but something similar is very much needed for Bazel

purkhusid commented 5 years ago

@lberki Would it be possible to get someone from the core team to comment on this issue? I'm wondering if there are some major technical hurdles that need to be solved to be able to add a digest for each target in either query or aquery.

aquery already has a action key in it's output. I wonder if it would be possible to add a digest per action that as well? I'm willing to take a stab at it if it's something that could possibly be accepted into Bazel.

lberki commented 5 years ago

/cc @ericfelly

I would love this to happen, but it requires a lot of starts to align

ericfelly commented 5 years ago

It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach?

@haxorz

haxorz commented 5 years ago

[@purkhusid ] a digest for each target in either query or aquery... aquery already has a action key in it's output

to elaborate on @lberki 's response, buck's https://buck.build/concept/rule_keys.html are transitive but bazel's keys are non-transitive.

the usefulness of transitive digests is, i presume, that you'd have this useful property for deciding when to rebuild test targets: If TRANSITIVE_DIGEST(//foo:bar) is v1 at source state A And TRANSITIVE_DIGEST(//foo:bar) is v2 at source state B Then v1 = v2 iff "//foo:bar doesn't need to be rebuilt at source state B"

[@ericfelly ] It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach?

yes, that's my assessment too.

purkhusid commented 5 years ago

@ericfelly @haxorz In my case it's to figure out what artifacts changed between two commits so that we can deploy only changed artifacts. So the rule key is just one way to do this, but I think there are multiple ways to achieve this.

I'm not familiar enough with Bazel internals to find the most idiomatic way to achieve this within bazel but I can suggest a few.

Include the digest for each action in aquery
- This way you could do a aquery at 2 revisions and compre each action in each target
Include the digest for for all actions in BEP
- This way you could just output the BEP events and compare at two revisions. But actions that are cached would also have to be shown in the output.
Add the digests to bazel dump
Add some way to dump the whole build graph including the digests
- Then you could just compare the graphs at 2 revisions

Some way to achieve this would fill a much needed gap for Bazel. In my case it would solve what deployment targets I should run and for large repos where bazel build //... is too slow it would provide a way to only run changed targets.

haxorz commented 5 years ago

@purkhusid but can you please confirm/deny the transitive part of my previous comment?

i can see two general approaches to a "determine affected targets at source version A" oracle that lives outside* of bazel:

(1) use specially-crafted repo-scale bazel query invocations at versions pred(A) and A, driven by the source diff between pred(A) and A, to determine all the affected targets at version A.

(2) have bazel dump a transitive hash of every target in the repo at versions pred(A) and A, then compare hashes pairwise. treat any target T whose hash has changed as affected at version A.

am i missing something, or have i concisely summarized the two approaches?

notes:

@vitarb's first comment describes an attempt to do (1).
a good implementation of (1) uses a non-transitive hash to compute "directly affected targets". the idea is to hash all of the target's attribute values and also its .bzl rule implementation -- this way comparisons are cheaper.
i am attempting to confirm that @purkhusid's et al. has some something like (2) in mind.

there are tradeoffs between these two general approaches. (2) unconditionally does repo-scale work, since it unconditionally computes and dumps a [transitive] hash of every target in the repo. contrast that with (1), which does basically no work for trivial changes. but the downside of (1) is the worst-case amount of work is perhaps higher since a repo-scale bazel query is perhaps more expensive than a mythical transitive hash dump of every target [eh, not necessarily].

* i say "lives outside bazel" because bazel's incrementality engine is ofc its own oracle but in @vitarb's first comment they say they don't want to unconditionally run bazel test //... at every source version. this is presumably because bazel doesn't scale infinitely.

purkhusid commented 5 years ago

@haxorz These 2 options do pretty much summarize it.

Option (2) does sound like the most user friendly way of doing this. All attempts at (1) that I've seen so far require that you jump through various hoops and usually end up being very complicated as can be seen here for example: https://groups.google.com/forum/#!msg/bazel-discuss/I9udqWIcEdI/iczVgWLOBQAJ

Going the (2) route would make this more native to bazel and a whole lot easier to do. But my guess is that creating transitive keys for each action/rule is a non-trivial task?

haxorz commented 5 years ago

@purkhusid But my guess is that creating transitive keys for each action/rule is a non-trivial task?

correct, both in terms of amount of code that would need to be written and also in terms of the runtime cost of that code. expanding on the latter, i think we'd not want this code to run by default (and when it runs we don't want to store the full results inside of the bazel server either); e.g. going with your aquery idea, we could add a flag --compute_and_dump_transitive_hashes to aquery.

robbertvanginkel commented 5 years ago

I work with @linzhp and @vitarb, for some context on why we use use the list of changed targets between revisions and how it is implemented:

Don't run bazel test //..., but a subset. Running less targets is useful while we're still working to get features like Build without the bytes and remote execution up and running.
Figure out artifacts that have changed between revisions that need to be published (containers, documentation, etc).
A service that merges requests into the master branch uses the changed targets (and graph) information to determine which changes don't intersect, and therefore can be merged independently.

There's some more ideas, such as figuring out which targets changed directly vs which ones changed because a dependency changed, and test directly affected targets with more expensive features like msan or a race detector etc.

Our current implementation is like the mentioned 2):

For each revision, run a bazel query --output=proto --order_output=full "//external:all-targets + deps(//...:all-targets)
Compute hashes for each target. The hash contains: target's attributes (filter a few that contain machine specific paths) and hashes of the target's inputs (so the hashes are transitive, which because of using --order_output=full is actually a pretty straightforward loop over the output proto). Hashes for file dependencies either come from disk or an input like git ls-tree --full-tree -r. The proto output contains a $rule_implementation_hash attribute, so it takes into account rule changes.
Diff the hashes as needed and write out lists/graphs of filtered changed targets. Depending on what use case the list would feed into, we'd filter by tags, target type, etc.

We chose this approach over 1) as it was more straightforward to implement and predict the performance and accuracy. The approach mentioned in 1) gets particularly tricky when dealing with changes to the workspace and rule implementations.

We ended up choosing query over aquery because a few reasons: aquery was still experimental at the time of implementation, performance of query being better than aquery and internal systems that needed to know about a graph of unconfigured targets. Although this has mostly been fine for us, we have run into some edge cases where valid bazel build graphs were not easily hash-able at the unconfigured target level. Specifically we had a dependency where target //a would depend on target //b for darwin, but target //b would depend on //a when configured for linux. While there was some debate and ideas for how to account for that, we ended up just fixing the "cycle" in the dependency. Going with an aquery based solution is likely the more correct approach.

I'd love to see this supported in bazel at some point. One caveat of our approach is that our list of targets is getting rather long and unfortunately bazel doesn't have support for invocation with argfiles, so we use a bit of a bazelrc workaround to feed the list into a build step (https://github.com/bazelbuild/bazel/issues/8609#issuecomment-501096217). Meanwhile I'll see if we can make the separate go code we use for the above open source.

ericfelly commented 5 years ago

Thank you for that context.

One relevant fact that I failed to mention is that we do have nascent / experimental code in place for top-down caching : https://github.com/bazelbuild/bazel/commit/c5c078cb60b2d11c8c17f993d9b10582ee984d5f

I imagine we could come up with some interface to expose the action sketches if this is the sort of thing that would facilitate your work here.

purkhusid commented 5 years ago

@robbertvanginkel It would be pretty awesome if your approach could be open sourced while Bazel does not have the tools needed available. We are in the early stages of our Bazel adoption and this is our biggest pain point at the moment. We would gladly help with making it more robust.

@ericfelly Is this some form of the transitive keys that @haxorz talked about?

ericfelly commented 4 years ago

Yes these are a form of transitive keys. We don't have an interface which exposes them directly. What would you like it to look like?

purkhusid commented 4 years ago

Is it possible to expose it in aquery via a flag?

ericfelly commented 4 years ago

@meisterT do you think the action sketches could be exposed in aquery? how difficult would that be?

meisterT commented 4 years ago

I am not yet familiar with action sketches. When are they computed? cc @joeleba who is picking up some aquery work

ericfelly commented 4 years ago

They are currently computed when you have top-down caching (experimental feature) enabled. See ActionSketchFunction.

What you could do is, when you run the analysis phase, launch the ActionSketchFunction for each action you come across. Then you could expose the action sketch in the output of aquery.

mmlac commented 4 years ago

We have a similar use-case. The question we want to answer is simply “what targets changed since the last master commit” to determine which need to be deployed.

What form factor that output has is not as important (diff a set of hashes between two git commits straight in bazel, a list of targets + hash as build artifact and then a manual diff, etc).

While hashing the deployment files works as a workaround, a query answering the question “what targets changed between commit A and B” seems very useful.

purkhusid commented 4 years ago

@robbertvanginkel @linzhp @vitarb Could you elaborate on what part of the query output you use to calculate the hash for each target? I'm taking a look at doing something similar but I'm not so sure what parts of the proto output I should be interested in.

purkhusid commented 4 years ago

@ericfelly @meisterT @joeleba Any chance we could get the action sketches exposed? I tried figuring it out myself but it seems like this requires pretty deep knowledge of the Bazel codebase to wire up.

robbertvanginkel commented 4 years ago

@purkhusid we create a hash per target based on all the attributes available, then using a merkle tree like approach we create a hash for each target based on the target's inputs. The "Test Selection" talk done by Benjamin Peterson from Dropbox at Bazelfconf 2019 describes a very similar approach: https://www.youtube.com/watch?v=9Dk7mtIm7_A.

I'm working on open-sourcing the code we have for this, taking a little bit of time to get that approved internally. Hopefully done in a week or two.

purkhusid commented 4 years ago

@robbertvanginkel Any news on the open sourcing?

robbertvanginkel commented 4 years ago

Thanks for the ping, still under review unfortunately.

purkhusid commented 4 years ago

@robbertvanginkel Has there been any movement on this?

purkhusid commented 4 years ago

@robbertvanginkel Any news?

robbertvanginkel commented 4 years ago

Sorry, no news. Unfortunately, given the current situation opensourceing internal tools isn't high on the priority list.

purkhusid commented 4 years ago

Understood, thanks for the update!

tokudu commented 4 years ago

@robbertvanginkel im curious how fast your algorithm is? And how big your target graph is.

I'm debating if we should implement something similar, but performance is an important consideration, since our build graph is massive.

linzhp commented 4 years ago

bazel query //... returns over 64k targets, which doesn't include about 1700 external repos. On a typical diff, it takes about 2 minutes to find all changed targets on those powerful cloud nodes.

How big is your build graph?

tokudu commented 4 years ago

less than that, which is good. Thanks!

robbertvanginkel commented 4 years ago

The main overhead we have is from the bazel query mentioned in https://github.com/bazelbuild/bazel/issues/7962#issuecomment-553042506. Especially with external repositories that might or might not be cached there can be quite a bit of variance. With the query complete, doing all the processing and hashing in go is usually quite fast and mostly dominated by hashing input files.

tinder-maxwellelliott commented 4 years ago

I feel like Google Search engine is pushing all of us to this very post to talk about how to fix something that Bazel should probably have 🤣

I have taken a stab at the SHA generation for the whole graph based on the Proto output (as described by @robbertvanginkel) in the following Gist. I did have some questions though about this approach

What attributes from the Rule should be part of the SHA we create? Right now I use all of them (except location), not sure if that is right

tinder-maxwellelliott commented 4 years ago

I feel like Google Search engine is pushing all of us to this very post to talk about how to fix something that Bazel should probably have 🤣

I have taken a stab at the SHA generation for the whole graph based on the Proto output (as described by @robbertvanginkel) in the following Gist. I did have some questions though about this approach

What attributes from the Rule should be part of the SHA we create? Right now I use all of them (except location), not sure if that is right

Looping back here one last time! The algorithm in this Gist. is working well in our CI systems have not seen it miss anything yet. Good luck to anyone else trying to implement this by hand.

purkhusid commented 4 years ago

@robbertvanginkel @linzhp Has there been any luck with open sourcing what your have at Uber? I took a stab at this myself but I have a feeling that I might have missed some edge cases: https://github.com/purkhusid/biff

rohansingh commented 4 years ago

@purkhusid I tried out biff just now and I'm impressed, it seems to work out of the box. I haven't thrown any weird edge cases at it but it seems to be working for most "normal" commits.

purkhusid commented 4 years ago

@purkhusid I tried out biff just now and I'm impressed, it seems to work out of the box. I haven't thrown any weird edge cases at it but it seems to be working for most "normal" commits.

@rohansingh Awesome! I've been meaning to put some more time into it and add some tests to validate that it does the right thing. If you have any improvements you would like to add to it then don't hesitate to send a PR/create an issue.

janakdr commented 4 years ago

Internally at Google, we use something more like @haxorz 's (1). For that reason, I think it's unlikely that we'll prioritize exposing such a transitive hash. However, the action sketch mentioned by @ericfelly seems like it could be a good option in concert with aquery. It's basically all open-sourced already, so PRs to integrate it seem reasonable, either from Google or other contributors.

tinder-maxwellelliott commented 4 years ago

I finally got around to open sourcing our target selection system, seen here https://github.com/Tinder/bazel-diff. It is a ready to go CLI system to allow you to perform target selection on massive codebases (handles massive Bazel query argument lists, and massive Bazel Protobuf's via the streamed-proto output option)

haxorz commented 4 years ago

@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM!

Fyi: You would have been bit by https://github.com/bazelbuild/bazel/issues/12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!).

tinder-maxwellelliott commented 4 years ago

@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM!

Fyi: You would have been bit by #12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!).

Basically we cannot rely on rule_implementation_hash until a new Bazel release? Unsure what action I can take right now in the repo to resolve this

haxorz commented 4 years ago

Yes, using rule_implementation_hash (aka skylark_environment_hash_code in build.proto-land) will give you false negatives wrt the "has this target been directly affected?" decision (using my lingo in https://github.com/bazelbuild/bazel/issues/7962#issuecomment-552945740). There's no action you can take in your code (I guess you could be super-duper conversative and pretend that all targets of bzl-defined rules are always "directly affected" but that seems bad). We were bitten by the bug internally at Google too :(

The bug occurs only in the situation described in the issue title. Maybe that doesn't happen in the codebase in which you use your bazel-diff tool? It certainly happened in @linzhp 's codebase (hence why they filed the issue), and it certainly happened in Google's internal codebase.

The bug was introduced in commit 9f2cab5 on 12 May. I don't know offhand which Bazel release first included that commit, nor do I know which release will first include the fix. I can look up that info for you if you want.

rohansingh commented 4 years ago

Bazel v3.3.0 is the first release that included the bug.

linzhp commented 4 years ago

Yes, the rule_implementation_hash issue is still affecting Uber's code base. Any time line when the fix can be released?

meisterT commented 4 years ago

cc @aiuto for the 3.5.1 patch release and @laurentlb for the 3.6.0 release

codesuki commented 3 years ago

To save people from digging, it seems the issue @haxorz mentioned is fixed in 3.7. https://github.com/bazelbuild/bazel/issues/12086#issuecomment-715659624

github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

bazelbuild / bazel

Feature request: --show-rulekey #7962

Description of the problem / feature request:

Feature requests: what underlying problem are you trying to solve with this feature?