bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.13k stars 4.05k forks source link

Querying rdeps of BUILD file don't yield targets that BUILD file contains #1831

Closed kavehv closed 8 years ago

kavehv commented 8 years ago

We use an rdeps query to bazel to get a list of targets to rebuild based on a set of files that have changed. bazel query --keep_going 'kind(rule, rdeps(..., set(%s))) where %s is a set of files determined to have changed through some VCS query.

If that set only includes a bazel BUILD file, bazel's rdeps query returns no targets, not even a genrule that's in the BUILD file that changed.

Changed files: [u'realtime/BUILD'] Running builds and tests: []

This seems broken as the targets defined by a BUILD file should depend on that BUILD file, shouldn't they?

damienmg commented 8 years ago

Yes they should but I have no idea what is wrong, pulling in @laszlocsomor to investigate.

damienmg commented 8 years ago

@laszlocsomor: if you can just diagnose the issue and maybe adapt the priority, that would be great :)

laszlocsomor commented 8 years ago

This is working as intended.

Bazel query queries over the target graph. The nodes of this graph are the targets, that is, everything that has a label (rules, input files, output files). There's an edge from target A to target B if A is a rule and one of its attributes (e.g. deps) includes B, or A is an output file generated by B.

In other words, if a rule "foo" has srcs=["bar"] then rdeps(<universe>, bar) returns "foo". rdeps fails to return the BUILD file because the BUILD file, being just another target (input file) in the package, is not listed in the rule's srcs/deps/whatever.

If you had a rule like this:

genrule(
    name = "foo1",
    srcs = ["BUILD", "bar"],
    outs = ["foo1.out"],
    cmd = "...",
)

filegroup(
    name = "foo2",
    srcs = ["baz"],
)

then rdeps(//foo, //foo:BUILD) would return foo1 and foo1.out, but not foo2 because BUILD is not a dependency of foo2.

@kavehv : It sounds like you're trying to reimplement caching that's already done by Bazel. If you keep rebuilding everything in your workspace, Bazel will only rebuild the stuff that has changed, so there's no need to tell it exactly what to rebuild. In any case you can assume that every rule and output file target in the package of the BUILD file depends on the BUILD file itself.

kavehv commented 8 years ago

@laszlocsomor What if the only file you changed is the BUILD file? Does it not stand to reason that all the rules defined by that BUILD file should be re-executed/built to confirm that change hasn't broken anything?

I guess what I'm asking is shouldn't a rule defined in a BUILD file implicitly be dependent on that file?

laszlocsomor commented 8 years ago

@kavehv: Yes it should, and it in fact is, but bazel query cannot tell you that, because the graph where Bazel keeps track of this information is not the one that bazel query operates over.

We have another graph inside of Bazel, implicitly defined by Skyframe, which keeps track of all these dependencies. So if you only change the BUILD file //foo:BUILD by editing an attribute of rule "bar", then bazel build //foo:all will rebuild only those targets that were affected by this change, i.e. "bar" itself and any rules that depend on it, but not those that don't depend on it.

laszlocsomor commented 8 years ago

Unfortunately we have no machinery (yet?) to query over the Skyframe graph.

kavehv commented 8 years ago

@laszlocsomor So I guess the next question is why can't we query over that graph? Why is there a separate graph for the query?

laszlocsomor commented 8 years ago

oops, answered just as you asked :)

laszlocsomor commented 8 years ago

What problem are you trying to solve by querying the rdeps?

kavehv commented 8 years ago

Basically, what we're trying to do is get a list of things to build from the set of files that have changed in a git commit. The reason we do it this way is to have bazel build the minimum set of rules it needs to build to accomplish that task in a workspace that is on a generic build machine. That workspace may have switched from any other branch, so there may be other things to build too, and bazel will figure that out, but we only want to command the minimum set of things we care about to keep the build time to a minimum.

laszlocsomor commented 8 years ago

I see, that sounds like a reasonable use case to me.

Then it seems safe to assume that any change to a BUILD file should automatically invalidate all targets in that package. Bazel query itself couldn't say anything finer grained than that.

Does that help?

kavehv commented 8 years ago

Yes, that's the functionality I'm looking for.

laszlocsomor commented 8 years ago

Then I believe the best way forward is to add some logic to the script invoking bazel to look at the delta since the last checkout, and if there are any BUILD files in there then run get the list of targets in that package (bazel query "kind(rule, //path/to/package:*)"), consider all of them invalid, plus take the rdeps of the rest of the changed files, and that'll be the set of targets to rebuild.

Globegitter commented 6 years ago

@laszlocsomor @kavehv Has there been any update on this? I am looking for exactly the same functionality.

laszlocsomor commented 6 years ago

Hey @Globegitter , I'm sorry about the long silence. No, there haven't been any updates.

IIUC, what @kavehv wanted to know was, if I change a file, what targets would be affected by it? And indeed this can be more than just the targets that explicitly reference this file in one of their attributes: for example the BUILD file affects all targets declared within it, and changing a header file in your system includes directory would have an effect on all cc_* rules and their rdeps.

From Bazel's perspective, this translates to knowing the rdeps of an input file, not over the target graph but over the Skyframe graph. I believe in Skyframe that means rdeps of FileStateValues, with the results filtered for TargetValues.

There are numerous people far better qualified to comment on the viability of that than myself, so let me ping the first few people that come to mind: @janakdr, @michajlo, and @haxorz.

janakdr commented 6 years ago

Is https://docs.bazel.build/versions/master/query.html#rbuildfiles helpful?

janakdr commented 6 years ago

I should say that you'll still need functionality on your side to, if a BUILD target changes, regard all targets in the package as affected. The siblings operator may be useful there.

kavehv commented 6 years ago

I don't think this problem has been resolved yet. We still don't have an avenue to easily query for changes to those BUILD files. Furthermore, I came across another example of this when updating our WORKSPACE file to point to a new tag of Google's re2 repo. Every dependency on this re2 target should have been specified by my rdeps query to bazel, but it was not.

janakdr commented 6 years ago

What were your results using the rbuildfiles operator?

kavehv commented 6 years ago

Perhaps I'm just not reading the right part of the documentation, but I cannot seem to concoct a query using rbuildfiles that doesn't result in a syntax error. An example in the documentation would be supremely helpful.

janakdr commented 6 years ago

Did you read the section on Sky Query?

haxorz commented 6 years ago

the examples in bazel's own tests (https://github.com/bazelbuild/bazel/blob/a2d2615362c65be98629b39ce39754a325ed1c42/src/test/shell/integration/bazel_query_test.sh#L219-L379) might be useful to you.

kavehv commented 6 years ago

Yes, and again, an example would be useful here. Unless I misunderstand you and the documentation, I don't see how I could query with --universal_scope="//"?

$ bazel --bazelrc=devtools/bazel.rc query --universe_scope="//" --order_output=no 'buildfiles(rbuildfiles(WORKSPACE))' ERROR: Skipping '//': the empty string is not a valid target ERROR: Skipping '//': the empty string is not a valid target //external:WORKSPACE //devtools:BUILD //realtime:BUILD //devtools:ccs_build.bzl ERROR: Evaluation of query "buildfiles(rbuildfiles('WORKSPACE'))" failed due to BUILD file errors

janakdr commented 6 years ago

"//" isn't a valid target pattern. If you want to query your whole workspace, "//..." is a valid pattern. "//...:*" will include things like output files (non-rules).

kavehv commented 6 years ago

Thanks, using that I can get a list of rules with

bazel --bazelrc=devtools/bazel.rc query --universe_scope="//..." --order_output=no 'kind(rule, siblings(buildfiles(rbuildfiles(WORKSPACE))))

However, when I try to expand it to multiple BUILD files it fails: bazel --bazelrc=devtools/bazel.rc query --universe_scope="//..." --order_output=no 'kind(rule, siblings(buildfiles(rbuildfiles(WORKSPACE + foo/BUILD))))

janakdr commented 6 years ago

Try separating your arguments with commas.

kavehv commented 6 years ago

Aha...this worked, thanks. It's not optimal, but it's at least doable. I still think that this is something rdeps should provide when pointed to any BUILD file.

cpoole commented 4 years ago

I agree, a build file modification should definitely by considered a dependency of the targets it defines.

ulfjack commented 4 years ago

Do you realize that making such a change to query is not enough to calculate the affected targets?

A BUILD file change can cause four types of changes: (1) remove a target (2) modify a target in a significant way (3) modify a target in a way that does not actually affect it (e.g., by changing whitespace or formatting in the target definition in a way that has no semantic effect) (4) add a new target

Currently, BUILD file changes are ignored, and what you're proposing would result in covering cases (2), (3), and (4). In case (3), we'd actually prefer not being notified about the change, if possible (but maybe not a deal-breaker). For case (1), you need to do a before/after diff on the target graph, which query cannot do by design (it can only see the one version of the code it's running in). If you have to do that anyway, making that also cover (2) and (4) seems not too difficult.

Maybe there's a use case here for making the proposed change, but it's not immediately obvious to me.

moesy commented 1 year ago

Anyone know the recommended way to address this today? rdeps still return empty when the set of files contains only build and bzl files.