bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.21k stars 4.07k forks source link

Running rdeps / allrdeps queries without fetching all external dependencies #13190

Open friendly-pineapple opened 3 years ago

friendly-pineapple commented 3 years ago

Description of the problem / feature request:

Aloha!

Our team is using rdeps / allrdeps to query reverse dependencies of targets. For example, this lets us know that for a given change to a local target, what is the set of reverse dependencies we should also build and test in order to validate that change.

We noticed that when running the query like this:

bazel query --universe_scope=//local_reverse_dependencies/... --order_output=no "allrdeps(//updated_local_target:name)"

-- this command will fetch all external dependencies of //local_reverse_dependencies/... -- for example, all http_file files, all rules_jvm_external artifacts, all rules_docker images, etc.

But since any changes to //updated_local_target:name would not affect external dependencies, we're not sure why they need to be fetched.

Our ask is simply: Is there a way to execute this query without requiring all external dependencies of //local_reverse_dependencies/... to be fetched?

Thank you for your help!

What operating system are you running Bazel on?

macOS Catalina 10.15.4 (19E287)

What's the output of bazel info release?

>_ bazel info release
release 3.0.0

Have you found anything relevant by searching the web?

We haven't seen anything else related to this on GitHub issues or elsewhere.

friendly-pineapple commented 3 years ago

Aloha - quick ping on this :)

This is not a feature request or bug report, but we were wondering if someone has encountered this in the past and could offer guidance.

Thanks!!

jin commented 3 years ago

A better avenue for issues that aren't bugs or FRs will be StackOverflow, bazel-discuss or slack.bazel.build.

gregestren commented 3 years ago

Hi @friendly-pineapple , I tagged this with team-Performance, who has the best insights into query. Hopefully they can offer some more guidance.

friendly-pineapple commented 3 years ago

Great, thank you!!

friendly-pineapple commented 3 years ago

Hey, just wanted to bump this one more time and see if anyone from team-Performance could give us insight on this. Thanks!

alexander-scott commented 2 years ago

Bump, I would also be interested in knowing if this is possible or not. I noticed this flag here https://docs.bazel.build/versions/main/command-line-reference.html#flag--experimental_repository_disable_download which might be of help, but AFAIK it's not available on the version we're currently using (4.0) :(

meisterT commented 2 years ago

@Wyverald any ideas?

Wyverald commented 2 years ago

I'm not familiar with how query handles external deps (since it seems to go a different path than normal loading/analysis?). I imagine we'll eventually need to understand how it works, but no promises on the timeline.

ravenblackx commented 2 years ago

Bump, I would also be interested in knowing if this is possible or not. I noticed this flag here https://docs.bazel.build/versions/main/command-line-reference.html#flag--experimental_repository_disable_download which might be of help, but AFAIK it's not available on the version we're currently using (4.0) :(

Just an FYI (and for future people having this same problem), --experimental_repository_disable_download doesn't help, it just makes the query fail, Error in download_and_extract: java.io.IOException: Failed to download repo [whatever]: download is disabled.

I think the purpose of that flag is to allow the user to not build if a download is required (maybe for metered connections or something), but it stops immediately, it doesn't go on to do as much of what you asked for as it can without the download.

(This issue is quite frustrating for wanting to do an rdeps search when Go code is involved.)

uri-canva commented 2 years ago

We hit this issue with a query with depth of 1. At first I thought getting all the dependencies made sense because you can never know if //updated_local_target:name will appear in the transitive dependency closure via some target using the @//updated_local_target:name syntax to refer to the root workspace, but since it reproes with a depth of 1 I guess there's quite a big missing optimisation there.

uri-canva commented 2 years ago

Tested deps with a depth of 1 and it doesn't have the same issue, so it seems like the query engine should be able to handle that case for rdeps as well, at least in theory?

thefallentree commented 2 years ago

Ping? We are running into this issue on 5.2.0 as well, In our repo, we make use of a lot of external dependencies. And this issue makes running any query with rdeps() having to fetch all external repositories, which can be very slow. So it makes resolving a list of target affected by list of files very painful.

@meisterT any possibility to raise priority on this would be great!

meisterT commented 2 years ago

Both SkyQuery with --universe_scope=universe and rdeps(universe, x) first collect all transitive deps of universe and with that the external deps.

Assigning to @meteorcloudy to assess the priority of coming up with a solution here.

thefallentree commented 2 years ago

I can try hacking bazel during the weekend, If someone could give me some guidance on where to start, it would be great!

meteorcloudy commented 2 years ago

Both SkyQuery with --universe_scope=universe and rdeps(universe, x) first collect all transitive deps of universe and with that the external deps.

Is this necessary? Not sure this is some low hanging fruit we can harvest easily or it requires major changes. We still have more urgent query related bugs to fix when Bzlmod is enabled: https://github.com/bazelbuild/bazel/issues/15044, so I think this is at most a P2.

friendly-pineapple commented 2 years ago

This behavior renders rdeps unusable in any repository that has a sufficient amount of external dependencies. Since rdeps is recommended as a mechanism for granularly testing the components of a repository in response to a change list (example from bazelbuild), this prevents great difficulties.

meteorcloudy commented 2 years ago

@meisterT I think this is a really nice to have P2 feature request, but maybe you can assess how much work is needed to make it work?

Wyverald commented 2 years ago

@haxorz points out that the definition of --universe_scope states that it will load all transitive packages (documentation):

--universe_scope=<target_pattern1>,...,<target_patternN> tells query to preload the transitive closure of the target pattern specified by the target patterns, which can be both additive and subtractive. All queries are then evaluated in this "scope".

With that in mind, the behavior you observed is actually working as intended. This matches what @meisterT said above (https://github.com/bazelbuild/bazel/issues/13190#issuecomment-1158701551).

It's not very easy to create packages with dangling dependency links, and then perform queries on this broken graph. Plus, to safely avoid loading external repos, we'd need to first analyze the query and conclude that it only contains "rdeps"-like expressions (meaning that we could do without the transitive dep closure). This sounds way harder than we have cycles for.

Given that, I think P3 is a valid priority for this issue. If any community member would like to take a crack at this, we'd certainly appreciate it.

meteorcloudy commented 2 years ago

Thanks! @Wyverald

Since rdeps is recommended as a mechanism for granularly testing the components of a repository in response to a change list (example from bazelbuild), this prevents great difficulties.

If this is the need, maybe https://github.com/Tinder/bazel-diff could be helpful?

thefallentree commented 2 years ago

Thanks! @Wyverald

Since rdeps is recommended as a mechanism for granularly testing the components of a repository in response to a change list (example from bazelbuild), this prevents great difficulties.

If this is the need, maybe https://github.com/Tinder/bazel-diff could be helpful?

Thanks for the pointer, it looks like bazel-diff is doing forward dependencies queries, then does the intersect after receiving the full graph, instead of rdeps queries. see https://github.com/Tinder/bazel-diff/blob/06a75b83fa2cb05c52b9b93a8fa679cc2585ae7c/cli/src/main/kotlin/com/bazel_diff/bazel/BazelClient.kt#L41

I guess yes it could work, why don't we simply offer a similar solution in bazel itself? as least to my intuition, that's how I expected rdeps(... , scope) suppose to work.

uri-canva commented 2 years ago

@Wyverald This issue reproduces even when not using --universe_scope.

haxorz commented 2 years ago

@Wyverald This issue reproduces even when not using --universe_scope.

@uri-canva Please give a full repro, or at least a realistic commandline.

If you just simply remove the --universe_scope=//local_reverse_dependencies/... from bazel query --universe_scope=//local_reverse_dependencies/... --order_output=no "allrdeps(//updated_local_target:name)" Bazel ought to be giving an error complaining about allrdeps not being a query function (this is because SkyQuery mode wasn't enabled).

Are you perhaps doing something like bazel query "rdeps(//something/that/depends/on/something/in/an/external:repo, //updated_local_target:name)"?

uri-canva commented 2 years ago

Are you perhaps doing something like bazel query "rdeps(//something/that/depends/on/something/in/an/external:repo, //updated_local_target:name)"?

Yes, exactly.

haxorz commented 2 years ago

Okay, then the mechanical answer to the

But since any changes to //updated_local_target:name would not affect external dependencies, we're not sure why they need to be fetched.

in the original post is:

The semantics of the query expression rdeps(E1, E2) is to first evaluate E1 to S1 and E2 to S2, then compute the forward transitive closure of S1 as C, then to do a graph traversal over the reverse edges in C starting from the nodes in S2. The "first evaluate E1 to S1... then compute the forward transitive closure of S1 as C" part for E1 = //something/that/depends/on/something/in/an/external:repo necessarily involves doing external repo stuff. You may know that for your particular situation there is no rdep path from S2 to anything in S1 that uses external repo stuff, but Bazel doesn't know that a priori.

[This is just an elaboration of @meisterT's earlier comment about rdeps.]

Sounds like you're asking for a flag --dont_do_external_repo_stuff_i_know_what_im_doing_and_accept_all_unsoundness_consequences. This is basically --nofetch but with semantics "silently do nothing" instead of "loudly fail".

uri-canva commented 2 years ago

Ah yes, I understand, so what I'm looking for is a function that doesn't exist, which looks for reverse dependencies in S1 rather than C.

Thanks for clarifying!

kechak commented 2 years ago

Looks like a combination of --experimental_repository_disable_download and --keep_going is a workaround for this.

thefallentree commented 2 years ago

no, that’s not the workaround.

On Fri, Sep 9, 2022 at 03:52 Zakhar Kanafalski @.***> wrote:

Looks like a combination of --experimental_repository_disable_download and --keep_going is a workaround for this.

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/13190#issuecomment-1241820026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJSYEF2Z4VTVDJKQPTGI4TV5MJHJANCNFSM4Y4V2OGQ . You are receiving this because you commented.Message ID: @.***>