Open linzhp opened 5 years ago
To give a little more context, this feature would be extremely useful and will help to produce a list of changed targets between two commits on CI allowing correctly scoped builds (instead of //...) and also allowing other pre-land optimizations. As previous discussion suggests there is no easy way to detect changes in the build graph when it comes to modifications in starlark files (WORKSPACE, BUILD, *.bzl, etc) Although it's possible to implement this logic outside of bazel, it would be prone to either false positives or false negatives and can also be rather slow.
For example I've built reasonably working prototype that:
bazel query deps(//...)
, parses it's output and builds in-memory representation of the dependency graph.Although it works fast and requires only one bazel query it may lead to possible false negatives:
We've also considered another approach, which relies on query of rbuildfiles for each changed starlark file but what makes it not acceptable is that any change to the WORKSPACE file would trigger pretty much full repo rebuild as all third-party dependencies are defined there. In addition to false positives it also has scalability issues as it needs to issues a separate query for each changed starlark file which runs ~0.5 sec, meaning that reasonably sized refactoring affecting 1000s of build files can be computing changed targets for 10s of minutes.
Are there any reasons preventing us from adding this feature into bazel core and printing checksum next to each rule in the dependency graph?
I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases.
I agree imo that should be part of bazel core - @vitarb is there any chance you can share the prototype you have mentioned? As you may have seen we are interested in a performant solution to this and apart from including this into bazel core itself (which I hope will happen at some point) this imo seems like the most general version and least prone to errors, even if it still has some edge cases.
@Globegitter, let me refine it a little bit and I will try to share something. Meanwhile one my colleagues suggested alternative implementation that computes target hashes by hashing workspace source files and bazel's query output which has all rule parameters in the proto format. This approach currently covers all major scenarios including changes to files in the repo, WORKSPACE changes, BUILD file changes and changes to native rules/starlark files. What I like about this approach is that it doesn't require processing starlark files and evaluating hashes for them, which is the most complex and error prone part of the solution suggested above.
On the other hand downsides are:
Since most of these issues can be either mitigated by proper cashing or other improvements it seems to be a pretty good approach overall and can lead to a concise and VCS independent solution.
At the same time having bazel do same would probably be more efficient.
Bouncing over to the Core people. However, given that rule keys are not how Bazel works (analysis phase caching relies on Java object identity alone and it's only actions that have hashes, which are a function of the contents of their input files), I wouldn't hold my breath.
@vitarb You don't happen to have anything that you can share? I've been looking for something that solves this problem but I've found no concrete solutions. Pants has a flag --changed-parent=<git hash here>
that takes care of this but something similar is very much needed for Bazel
@lberki Would it be possible to get someone from the core team to comment on this issue? I'm wondering if there are some major technical hurdles that need to be solved to be able to add a digest for each target in either query
or aquery
.
aquery
already has a action key in it's output. I wonder if it would be possible to add a digest per action that as well? I'm willing to take a stab at it if it's something that could possibly be accepted into Bazel.
/cc @ericfelly
I would love this to happen, but it requires a lot of starts to align
It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach?
@haxorz
[@purkhusid ] a digest for each target in either query or aquery... aquery already has a action key in it's output
to elaborate on @lberki 's response, buck's https://buck.build/concept/rule_keys.html are transitive but bazel's keys are non-transitive.
the usefulness of transitive digests is, i presume, that you'd have this useful property for deciding when to rebuild test targets:
If TRANSITIVE_DIGEST(//foo:bar)
is v1
at source state A
And TRANSITIVE_DIGEST(//foo:bar)
is v2
at source state B
Then v1 = v2
iff "//foo:bar doesn't need to be rebuilt at source state B"
[@ericfelly ] It's sounds like this is a general request for finding the affected targets given a change to the repo. Is that right? Is the rule key stuff an actual requirement, or just one possible approach?
yes, that's my assessment too.
@ericfelly @haxorz In my case it's to figure out what artifacts changed between two commits so that we can deploy only changed artifacts. So the rule key is just one way to do this, but I think there are multiple ways to achieve this.
I'm not familiar enough with Bazel internals to find the most idiomatic way to achieve this within bazel but I can suggest a few.
aquery
aquery
at 2 revisions and compre each action in each targetbazel dump
Some way to achieve this would fill a much needed gap for Bazel. In my case it would solve what deployment targets I should run and for large repos where bazel build //...
is too slow it would provide a way to only run changed targets.
@purkhusid but can you please confirm/deny the transitive part of my previous comment?
i can see two general approaches to a "determine affected targets at source version A" oracle that lives outside* of bazel:
(1) use specially-crafted repo-scale bazel query
invocations at versions pred(A)
and A
, driven by the source diff between pred(A)
and A
, to determine all the affected targets at version A
.
(2) have bazel dump a transitive hash of every target in the repo at versions pred(A)
and A
, then compare hashes pairwise. treat any target T whose hash has changed as affected at version A
.
am i missing something, or have i concisely summarized the two approaches?
notes:
there are tradeoffs between these two general approaches. (2) unconditionally does repo-scale work, since it unconditionally computes and dumps a [transitive] hash of every target in the repo. contrast that with (1), which does basically no work for trivial changes. but the downside of (1) is the worst-case amount of work is perhaps higher since a repo-scale bazel query
is perhaps more expensive than a mythical transitive hash dump of every target [eh, not necessarily].
* i say "lives outside bazel" because bazel's incrementality engine is ofc its own oracle but in @vitarb's first comment they say they don't want to unconditionally run bazel test //...
at every source version. this is presumably because bazel
doesn't scale infinitely.
@haxorz These 2 options do pretty much summarize it.
Option (2) does sound like the most user friendly way of doing this. All attempts at (1) that I've seen so far require that you jump through various hoops and usually end up being very complicated as can be seen here for example: https://groups.google.com/forum/#!msg/bazel-discuss/I9udqWIcEdI/iczVgWLOBQAJ
Going the (2) route would make this more native to bazel and a whole lot easier to do. But my guess is that creating transitive keys for each action/rule is a non-trivial task?
@purkhusid But my guess is that creating transitive keys for each action/rule is a non-trivial task?
correct, both in terms of amount of code that would need to be written and also in terms of the runtime cost of that code. expanding on the latter, i think we'd not want this code to run by default (and when it runs we don't want to store the full results inside of the bazel server either); e.g. going with your aquery
idea, we could add a flag --compute_and_dump_transitive_hashes to aquery
.
I work with @linzhp and @vitarb, for some context on why we use use the list of changed targets between revisions and how it is implemented:
bazel test //...
, but a subset. Running less targets is useful while we're still working to get features like Build without the bytes and remote execution up and running. There's some more ideas, such as figuring out which targets changed directly vs which ones changed because a dependency changed, and test directly affected targets with more expensive features like msan or a race detector etc.
Our current implementation is like the mentioned 2):
bazel query --output=proto --order_output=full "//external:all-targets + deps(//...:all-targets)
--order_output=full
is actually a pretty straightforward loop over the output proto). Hashes for file dependencies either come from disk or an input like git ls-tree --full-tree -r
. The proto output contains a $rule_implementation_hash
attribute, so it takes into account rule changes.We chose this approach over 1) as it was more straightforward to implement and predict the performance and accuracy. The approach mentioned in 1) gets particularly tricky when dealing with changes to the workspace and rule implementations.
We ended up choosing query
over aquery
because a few reasons: aquery was still experimental at the time of implementation, performance of query
being better than aquery
and internal systems that needed to know about a graph of unconfigured targets. Although this has mostly been fine for us, we have run into some edge cases where valid bazel build graphs were not easily hash-able at the unconfigured target level. Specifically we had a dependency where target //a
would depend on target //b
for darwin, but target //b
would depend on //a
when configured for linux. While there was some debate and ideas for how to account for that, we ended up just fixing the "cycle" in the dependency. Going with an aquery
based solution is likely the more correct approach.
I'd love to see this supported in bazel at some point. One caveat of our approach is that our list of targets is getting rather long and unfortunately bazel doesn't have support for invocation with argfiles, so we use a bit of a bazelrc workaround to feed the list into a build step (https://github.com/bazelbuild/bazel/issues/8609#issuecomment-501096217). Meanwhile I'll see if we can make the separate go code we use for the above open source.
Thank you for that context.
One relevant fact that I failed to mention is that we do have nascent / experimental code in place for top-down caching : https://github.com/bazelbuild/bazel/commit/c5c078cb60b2d11c8c17f993d9b10582ee984d5f
I imagine we could come up with some interface to expose the action sketches if this is the sort of thing that would facilitate your work here.
@robbertvanginkel It would be pretty awesome if your approach could be open sourced while Bazel does not have the tools needed available. We are in the early stages of our Bazel adoption and this is our biggest pain point at the moment. We would gladly help with making it more robust.
@ericfelly Is this some form of the transitive keys that @haxorz talked about?
Yes these are a form of transitive keys. We don't have an interface which exposes them directly. What would you like it to look like?
Is it possible to expose it in aquery via a flag?
@meisterT do you think the action sketches could be exposed in aquery? how difficult would that be?
I am not yet familiar with action sketches. When are they computed? cc @joeleba who is picking up some aquery work
They are currently computed when you have top-down caching (experimental feature) enabled. See ActionSketchFunction.
What you could do is, when you run the analysis phase, launch the ActionSketchFunction for each action you come across. Then you could expose the action sketch in the output of aquery.
We have a similar use-case. The question we want to answer is simply “what targets changed since the last master commit” to determine which need to be deployed.
What form factor that output has is not as important (diff a set of hashes between two git commits straight in bazel, a list of targets + hash as build artifact and then a manual diff, etc).
While hashing the deployment files works as a workaround, a query answering the question “what targets changed between commit A and B” seems very useful.
@robbertvanginkel @linzhp @vitarb Could you elaborate on what part of the query output you use to calculate the hash for each target? I'm taking a look at doing something similar but I'm not so sure what parts of the proto output I should be interested in.
@ericfelly @meisterT @joeleba Any chance we could get the action sketches exposed? I tried figuring it out myself but it seems like this requires pretty deep knowledge of the Bazel codebase to wire up.
@purkhusid we create a hash per target based on all the attributes available, then using a merkle tree like approach we create a hash for each target based on the target's inputs. The "Test Selection" talk done by Benjamin Peterson from Dropbox at Bazelfconf 2019 describes a very similar approach: https://www.youtube.com/watch?v=9Dk7mtIm7_A.
I'm working on open-sourcing the code we have for this, taking a little bit of time to get that approved internally. Hopefully done in a week or two.
@robbertvanginkel Any news on the open sourcing?
Thanks for the ping, still under review unfortunately.
@robbertvanginkel Has there been any movement on this?
@robbertvanginkel Any news?
Sorry, no news. Unfortunately, given the current situation opensourceing internal tools isn't high on the priority list.
Understood, thanks for the update!
@robbertvanginkel im curious how fast your algorithm is? And how big your target graph is.
I'm debating if we should implement something similar, but performance is an important consideration, since our build graph is massive.
bazel query //...
returns over 64k targets, which doesn't include about 1700 external repos. On a typical diff, it takes about 2 minutes to find all changed targets on those powerful cloud nodes.
How big is your build graph?
less than that, which is good. Thanks!
The main overhead we have is from the bazel query mentioned in https://github.com/bazelbuild/bazel/issues/7962#issuecomment-553042506. Especially with external repositories that might or might not be cached there can be quite a bit of variance. With the query complete, doing all the processing and hashing in go is usually quite fast and mostly dominated by hashing input files.
I feel like Google Search engine is pushing all of us to this very post to talk about how to fix something that Bazel should probably have 🤣
I have taken a stab at the SHA generation for the whole graph based on the Proto output (as described by @robbertvanginkel) in the following Gist. I did have some questions though about this approach
location
), not sure if that is rightI feel like Google Search engine is pushing all of us to this very post to talk about how to fix something that Bazel should probably have 🤣
I have taken a stab at the SHA generation for the whole graph based on the Proto output (as described by @robbertvanginkel) in the following Gist. I did have some questions though about this approach
- What attributes from the Rule should be part of the SHA we create? Right now I use all of them (except
location
), not sure if that is right
Looping back here one last time! The algorithm in this Gist. is working well in our CI systems have not seen it miss anything yet. Good luck to anyone else trying to implement this by hand.
@robbertvanginkel @linzhp Has there been any luck with open sourcing what your have at Uber? I took a stab at this myself but I have a feeling that I might have missed some edge cases: https://github.com/purkhusid/biff
@purkhusid I tried out biff
just now and I'm impressed, it seems to work out of the box. I haven't thrown any weird edge cases at it but it seems to be working for most "normal" commits.
@purkhusid I tried out
biff
just now and I'm impressed, it seems to work out of the box. I haven't thrown any weird edge cases at it but it seems to be working for most "normal" commits.
@rohansingh Awesome! I've been meaning to put some more time into it and add some tests to validate that it does the right thing. If you have any improvements you would like to add to it then don't hesitate to send a PR/create an issue.
Internally at Google, we use something more like @haxorz 's (1). For that reason, I think it's unlikely that we'll prioritize exposing such a transitive hash. However, the action sketch mentioned by @ericfelly seems like it could be a good option in concert with aquery. It's basically all open-sourced already, so PRs to integrate it seem reasonable, either from Google or other contributors.
I finally got around to open sourcing our target selection system, seen here https://github.com/Tinder/bazel-diff. It is a ready to go CLI system to allow you to perform target selection on massive codebases (handles massive Bazel query argument lists, and massive Bazel Protobuf's via the streamed-proto output option)
@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM!
Fyi: You would have been bit by https://github.com/bazelbuild/bazel/issues/12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!).
@tinder-maxwellelliott Cool! I skimmed through your code and it LGTM!
Fyi: You would have been bit by #12086 (specifically at https://github.com/Tinder/bazel-diff/blob/master/src/main/java/com/bazel-diff/BazelRule.java#L24), so I just wanted to make sure you're aware of that bug (and its fix!).
Basically we cannot rely on rule_implementation_hash
until a new Bazel release? Unsure what action I can take right now in the repo to resolve this
Yes, using rule_implementation_hash
(aka skylark_environment_hash_code
in build.proto
-land) will give you false negatives wrt the "has this target been directly affected?" decision (using my lingo in https://github.com/bazelbuild/bazel/issues/7962#issuecomment-552945740). There's no action you can take in your code (I guess you could be super-duper conversative and pretend that all targets of bzl-defined rule
s are always "directly affected" but that seems bad). We were bitten by the bug internally at Google too :(
The bug occurs only in the situation described in the issue title. Maybe that doesn't happen in the codebase in which you use your bazel-diff
tool? It certainly happened in @linzhp 's codebase (hence why they filed the issue), and it certainly happened in Google's internal codebase.
The bug was introduced in commit 9f2cab5 on 12 May. I don't know offhand which Bazel release first included that commit, nor do I know which release will first include the fix. I can look up that info for you if you want.
Bazel v3.3.0 is the first release that included the bug.
Yes, the rule_implementation_hash
issue is still affecting Uber's code base. Any time line when the fix can be released?
cc @aiuto for the 3.5.1 patch release and @laurentlb for the 3.6.0 release
To save people from digging, it seems the issue @haxorz mentioned is fixed in 3.7. https://github.com/bazelbuild/bazel/issues/12086#issuecomment-715659624
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage
) if you think this issue is still relevant or you are interested in getting the issue resolved.
Description of the problem / feature request:
Buck has the concept of rule keys, which we can obtain by running
buck targets --show-rulekey //...
. We'd love to see similar feature in Bazel.Feature requests: what underlying problem are you trying to solve with this feature?
We need to see what targets are changed, thus need to rebuilt and test, from one revision to another. In order to keep master green at scale, we need to build and test diffs in parallel. Getting the rule key for each target help us decide what diffs are independent and safe to build and test in parallel.