Closed ittaiz closed 1 year ago
cc @iirina @dbabkin
@ittaiz Could you point me to your code where you collect the jars and labels for strict deps?
This is for Scala import https://github.com/bazelbuild/rules_scala/blob/master/scala/scala_import.bzl Do you need also the “regular” Scala rules? There it’s a bit more complicated (though not very complicated)
No, I get the idea. Thanks! Just to be sure, JavaInfo(originating_label = ..., ...)
+ dep[JavaInfo].originating_label
would make this easier for you?
I think we have 3 issues here actually:
No, I get the idea. Thanks! Just to be sure, JavaInfo(originating_label = ..., ...) + dep[JavaInfo].originating_label would make this easier for you?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/4584#issuecomment-364084226, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF8fiuTEsfNH75MQCv9NTy7cyYn9uks5tStrWgaJpZM4R5xhL .
Just to make sure- AFAIU if I “export” a dependency then when someone asks for its label they will actually get mine, right?
They should get the label of the rule where the provider was created. The "exports" providers will just be passed around to the depending targets, so the label will be from where they originated.
The additional needed feature for it is that the provider will keep and expose the transitive labels.
I assume you need it only from the compile-time transitive closure, right? I don't like that for almost everything exposed there is a "compile-time" and a "runtime" version. But I don't see any alternative, so I will just expose the labels for the targets needed at compile-time for now.
expose the transitive labels since currently rules Scala just pass a custom dict which java rules don’t.
A dict is different than a set of transitive labels. How will you use the transitive labels collected from JavaInfo in the sandwich?
Not sure these are the correct semantics- I tried reading about exports in the documentation and I think (but not sure) that the correct semantics are that if A
depends on B
, B
depends on C
and C
exports D
then given A
transitively uses D
it should get a warning to add C
to its direct dependencies.
How this needs to be translated to the specific mechanics of providers is a good question but I think we should first agree on the semantics. WDYT?
If you have A -> B -> C => D
(where =>
is exports
and ->
normal dependency) and you try to use D from A you would get a warning/error to add D
to the dependencies of A
.
In this situation:
A
/ \
C -- B
//
D
where A uses D, there would be no error/warning raised.
Why do you think I should get a warning about D
? It makes more sense to me that I should get a warning about C
since I think from C
s point of view D
should be somewhat hidden
A dict is different than a set of transitive labels. How will you use the transitive labels collected from JavaInfo in the sandwich?
Sorry missed your message. I might be wrong but I think we use a dict now since that's what the compiler plugin needs, right? It needs to know the label of a specific jar and hence we need a dict.
Why do you think I should get a warning about D ?
This is how the native rules are implemented.
It makes more sense to me that I should get a warning about C since I think from Cs point of view D should be somewhat hidden.
When it gets to compiling A
, the correlation between C
and D
is lost. At this point A
will see dependencies as jars only.
@iirina maybe we can improve on how the native rules are implemented? You can take a look at how we do it in rules_scala (easy to see in scala_import) so as to retain the semantics.
I've just finished off preliminary "jars_to_label" analysis skylark (using aspects) for Kotlin.
The current analysis is/(will be) doing its best to pick up a label at least a single hop away from java_import
so that the label should end up being a java_library
or a kt_jvm_import
. I think the way I implemented it(or what I intended) currently a target exporting a dep can also be preferred if it's higher up.
Just noticed what the native rules are doing -- preferring D
always. Doesn't this go against the standard conventions of wrapping a java_import
with a java_library
in a third_party package ? The current behaviour means the buildozer suggestions are sub-optimal and a user still has to fix the deps manually.
That’s exactly my point... Irina, Can we have this behavior flagged maybe? I really think the native rules are less correct On Fri, 2 Mar 2018 at 4:41 Hassan Syed notifications@github.com wrote:
I've just finished off preliminary "jars_to_label" analysis skylark (using aspects) for Kotlin.
The current analysis https://github.com/bazelbuild/rules_kotlin/pull/40/files#diff-512753d8bc41aa82f338a5360561b267 is/(will be) doing its best to pick up a label at least a single hop away from java_import so that the label should end up being a java_library or a kt_jvm_import. I think the way I implemented it(or what I intended) currently a target exporting a dep can also be preferred if it's higher up.
Just noticed what the native rules are doing -- preferring D always. Doesn't this go against the standard conventions of wrapping a java_import with a java_library in a third_party package ? The current behaviour means the buildozer suggestions are sub-optimal and a user still has to fix the deps manually.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/4584#issuecomment-369802442, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF5IUw0oAR4GDAez4Y-7KwWvJoyQBks5taLFAgaJpZM4R5xhL .
@iirina wdyt?
Why do you think I should get a warning about D ? This is how the native rules are implemented.
This is also something that could be handled by the tool used to fix deps, instead of baking it into SJD.
We usually want the most precise deps for Java that provides the jar, instead of a target that export that dep, except e.g. if D is a private implementation detail that is always used with C.
the tool used to fix deps
I think we might have a terminological gap here. By the above do you mean buildozer as an example or the something else?
We usually want the most precise deps for Java that provides the jar, instead of a target that export that dep, except e.g. if D is a private implementation detail that is always used with C.
I thought export semantically meant exactly the above "D is a private implementation detail". BTW, I think maybe a more precise phrasing is that "D is a private implementation detail and C will always bring it with them"
do you mean buildozer
I mean the tool referenced by the ** You can use the following buildozer command:
message. Bazel uses buildozer, but the SJD side of this is factored so another tool could be used, e.g. something with more sophisticated handling of exports or visibility.
I thought export semantically meant exactly the above
That is one use-case for exports, not a definitions of its semantics.
@cushon I think that it will be very hard for a tool to reverse engineer the data and decide if it needs to add C
or D
. Maybe SJD needs to emit the offending graph.
@cgrushko for thoughts on visibility/exports handling and dependency management
SJD doesn't have a graph to emit. Bazel has knowledge of the graph, and provides access to it e.g. via bazel query. SJD is not the right place to put handling of visibility/exports for dependency management: we want that logic to be useable by other dependency management tools.
I think that it will be very hard for a tool to reverse engineer the data and decide if it needs to add C or D.
The C and D selection is quite easy if the labels are gathered via aspects. At the moment in my wip for kotlin I produce a list of candidates -- ordered by most precise label. Consider the example with an additional intermediary C'
exporting D
or C
.
A
/ \
C' \
/ \
C ---- B
//
D
In the Kotlin Wip I currently prefer C'
as C'
could be a neverlink
forwarding rule exporting a plugin and the plugins compile time deps. I don't know if this is too "clever". The aspect processing produces the jars2label ordered list (C', C , D)
and the rule then selects one of the labels for the jar.
Selecting "first hop away from external jar" or "most precise label exporting external jar" should be simple in an external tool if we also write out the jar2label map in the jdeps proto file.
For the buildozer commands the selection just needs to happen when the "javabuilder" or "kotlinbuilder" command line args are being prepared.
@ittaiz Another thought, I can see the point of selecting the leaf target in the workspace -- i.e., D
. What about pushing this to a refactoring / quick-fix in the ide.
The intellij plugin sync model already has a package to target mapping and it loads the jdeps file. Intelligence for composing build files should be added to the intellij plugin anyway. It is a bit of work.
ok continuing the design discussion that started in #4846 about an "API" for strict-deps labels passing between JVM rules. There are currently 3 options as far as I can tell:
I think I'm missing some context on the con of providers and I also think we need to hash out the exports
use-case since that feels to be the bone of contention here
First of all, there is little design discussion on these change anywhere - @tomlu, @cushon do you have a design document for the change that caused #4846 to share? It is absolutely not great to break non-ijar-generating use case. And it is not ok to break other JVM languages with unilateral change to JavaBuilder.
My guess is that Map -> Jar paths in providers is perceived to be problematic memory-wise: any rule in a transitive dependency chain will need to have a copy of the same map + some new entries. This map will need to be flattened on for every command line.
I guess we could have used a depset of pairs (string, jar) and flatten that instead - wdyt, @tomlu ?
My understanding is that we changed from propagating this information in providers/command lines to propagating them in the .jar files due to memory use concerns for our internal version (which has to contend with ginormous dependency graphs).
Instead of the above three alternatives, I think there are two problems:
And there is the separate problem of figuring out what to do with exports -- if we bake the label into the .jar files, we can't override the label of the actual rule emitting the .jar file with the label of the exporting rule, can we?
Let's continue the discussion on this thread: https://groups.google.com/d/msg/bazel-discuss/mt2llfwzmac/1in7UTq1AQAJ
I'll respond to everyone in this comment, then I'll follow up with proposals.
Responding to @ittaiz:
Pass a Map of Labels->Jar Paths in the providers.
Like others guessed that is O(N^2) on memory so sadly that show stops there. Square might not sound like a big deal but it can and does become gigabytes of resident heap :(
Pass a Map of Labels->Jar Paths in the command line
This is what we used to do for JavaBuilder as of one month ago. It would figure out the jar creator of each jar on the classpath and add that. (This is the reason you can't "redefine" the owner, whichever rule created the jar is the rule that is mentioned in the add_deps command.)
This is still O(N^2) in time. It more than doubles the size of JavaBuilder command lines and was measured to cost 5% build times on certain Google projects.
Bake the label into each jar (as a manifest attribute) and have each tool read it if it needs the label. 3.1 Pro- localizes the label "tagging" and is immune to intermediate rules not passing the providers (can this issue happen? not sure)
Pro: This is completely free until a strict deps violation actually occurs (then we need to actually open the jar), except of course the negligible cost of actually adding the label to the manifest when it's created.
3.2 Con- A jar has exactly one label and can't be redefined (again exports).
This was already the case with the previous solution, but yes, this is a property of this solution too.
Also needs to be reimplemented by tools creating jars if they circumvent ijar.
This is absolutely the big drawback.
Responding to @dslomov:
First of all, there is little design discussion on these change anywhere - @tomlu, @cushon do you have a design document for the change that caused #4846 to share?
There is no design doc (apart from email correspondence between me and @cushon) because it was believed to be an internal change, and a conceptually simple one at that. The thing we missed was use_ijar=False as an ingress point into the internal Java rule - JavaBuilder world.
It is absolutely not great to break non-ijar-generating use case. And it is not ok to break other JVM languages with unilateral change to JavaBuilder.
Breakages = not good. We'll do a small postmortem where we can discuss how we got here and how to not get here next time.
I guess we could have used a depset of pairs (string, jar) and flatten that instead - wdyt, @tomlu ?
That's still going to be O(N^2) and cost the same 5% wall time, maybe a bit more because we have to manage those depsets.
Responding to @lberki:
How to tell JavaBuilder the strict dependency information, in particular, the set of direct deps and the jar path -> label mapping for indirect deps
The set of direct deps is communicated today efficiently as the set of jars that are direct.
And there is the separate problem of figuring out what to do with exports -- if we bake the label into the .jar files, we can't override the label of the actual rule emitting the .jar file with the label of the exporting rule, can we?
That already didn't work, and I don't think it can work. You can have multiple paths from rules to a single jar (three different rules could export the same jar), which is the one true "owner"?
We can do something when there is a unique ingress point (like a scala_import), which is what I'll probably be proposing.
I guess we could have used a depset of pairs (string, jar) and flatten that instead - wdyt, @tomlu ?
That's still going to be O(N^2) and cost the same 5% wall time, maybe a bit more because we have to manage those depsets.
This is the bit I do not understand. depsets themselves would not cost O(N^2), right? So O(N^2) would be the cost of flattened command lines? But if command lines are lazy, no need to flatten them until the command is executed. Or are you saying that O(N^2) would be CPU time for flattening them at the moment the command will be executed? But there you already flattening the list of jars anyhow.
On Thu, Apr 12, 2018 at 2:13 AM, Dmitry Lomov notifications@github.com wrote:
I guess we could have used a depset of pairs (string, jar) and flatten that instead - wdyt, @tomlu https://github.com/tomlu ?
That's still going to be O(N^2) and cost the same 5% wall time, maybe a bit more because we have to manage those depsets.
This is the bit I do not understand. depsets themselves would not cost O(N^2), right? So O(N^2) would be the cost of flattened command lines? But if command lines are lazy, no need to flatten them until the command is executed. Or are you saying that O(N^2) would be CPU time for flattening them at the moment the command will be executed?. But there you already flatting the list of jars anyhow.
Your understanding is correct. So the cost is (A + B) * O(N^2), where A is the cost of executing the command (including when it is already cached), and B is the overhead of this additional information. We found that in the cached case B was 5% of total wall time. So in essence it's just about making the constant bigger/smaller, but when the constant is in front of a N^2 it makes a dent.
This was measured while we were really just reusing the depsets we already had, by grabbing owner information straight off the artifact owner. Managing another depset would increase B, and would increase memory use. Probably by some moderate amount in each case.
@cgrushko for thoughts on visibility/exports handling and dependency management
Jadep might be helpful here - you can tell it which fully-qualified symbol you're looking for, and it'll search around and add one after exports expansion and visibility filtering. The caveat is that its resolution strategies might not be a good fit for every situation that strict java deps errors appear in.
Can you clarify the O(N^2) @tomlu ? I am not seeing what is N^2 here? Are you assuming that each target depends on a constant fraction of the repo? That may be true, but it seems like it is an empircal fact you may have directly measured. It would actually be interesting do use bazel query to directly measure this. I can also imagine that practically each target depends on O(1) or O(log N) (or some sublinear) fraction of the repo.
Or maybe there is another source of the quadratic scaling? to_list on depset isn't quadratic is it? I guess it is linear in the number of merged sets and also linear in the total size of the reached set, so maybe you can argue those two parameters are linked and linearization is quadratic?
Are you assuming that each target depends on a constant fraction of the repo?
Java rules (discussed on this thread) use transitive classpaths, and the providers discussed deal with this classpath. This is transitive, so it usually has some superlinear relation to the number of targets N depending on the graph height. It's linear if you have N targets without edges between each other, but that's not very realistic.
Here's a contrived build graph:
A -> B -> C -> D
You could imagine that some providers would contain something like this:
D: ['D'] C: ['C', 'D'] B: ['B', 'C', 'D'] A: ['A', 'B', 'C', 'D']
Graph size: 4 nodes Total size: 10
This is clearly superlinear.
to_list on depset isn't quadratic is it
If you see list(depset) in a rule implementation you would strongly suspect that the rule will be square over your typical build graph. It isn't if the depset doesn't contain transitive information, but most depsets will (or why would you use a depset).
I guess this is another benefit of the scala rules not using transitive classpaths.
It is an interesting question what the structure of real build graphs tend to look like. I looked at this at Twitter while I was there. The monorepo at that time has something like O(10k) targets, but the depth of the graph was something like 50. So, the linked list graph you have is highly pessimistic I think in reality. Since large repos get that way from a lot of horizontal scaling I would imagine (many teams, not massively deep dependency graphs), it wouldn't surprise me if O(log N) wasn't the common depth of a build graph. It would be nice to have several example graphs to study (along with their shapes over time).
I'm not convinced this O(N^2) worst case, even for java, would actually be a practical barrier.
On Sun, Apr 29, 2018 at 3:41 PM, P. Oscar Boykin notifications@github.com wrote:
I guess this is another benefit of the scala rules not using transitive classpaths.
Well, add_deps doesn't work at all when you only pass directs, so that seems like an invalid point.
In fact, the entire SJD machinery becomes moot. In essence, by only passing directs, you (somewhat crudely) solve SJD by ensuring the user gets a full-on "no such symbol" compile error whenever anything is missing, without indication of how to fix it.
You vcoul
It is an interesting question what the structure of real build graphs tend
to look like. I looked at this at Twitter while I was there. The monorepo at that time has something like O(10k) targets, but the depth of the graph was something like 50. So, the linked list graph you have is highly pessimistic I think in reality.
I did that to illustrate the issue, not to claim all repos are structured in the worst case. Other more realistic shapes are also O(N^2), eg. binary trees.
Since large repos get that way from a lot of horizontal scaling I would
imagine (many teams, not massively deep dependency graphs), it wouldn't surprise me if O(log N) wasn't the common depth of a build graph. It would be nice to have several example graphs to study (along with their shapes over time).
I'm not convinced this O(N^2) worst case, even for java, would actually be a practical barrier.
The 5% total wall time cost was measured on a real application, so this isn't just theory.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/4584#issuecomment-385276018, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbnSghab846MCgb6NHpcl6evFkuijLuks5tthd-gaJpZM4R5xhL .
I am looking into adding stamp support so that import
rules can set the correct label for dep management tools. Currently just added it to kt_jvm_import
but I also contributed an alternative backend to bazel_deps and I would update that as well. Two issues I am seeing with this approach (this is still the perscribed approach right ?):
1) Stamping can only occur in specific packages (same repo , or somewhere in the package path). So it's not possible to kt_jvm_import
some jar in //third_party/BUILD
. See error below. This might be problematic. I just have to update bazel deps to support kt_jvm_import
and perhaps make kt_jvm_library
work without srcs.
2) JVM languages that cannot use stamp_ijar
have to use stamp_jar
? This creates a full copy of the source jar. Kotlin ijar support will likely begin with source -> ijar
and not bytecode -> ijar
so any kt_jvm_import
jar will get copied since we can't ijarify it. This is a lot of copying just to associate a single string with the jar.
error for point 1.
Output artifact 'external/io_bazel_rules_kotlin_org_jetbrains_kotlinx_kotlinx_coroutines_core/jar/io_bazel_rules_kotlin_org_jetbrains_kotlinx_kotlinx_coroutines_core-stamped.jar' not under package directory 'third_party' for target '//third_party:kotlinx_coroutines'
On Thu, Aug 16, 2018 at 12:33 PM, Hassan Syed notifications@github.com wrote:
I am looking into adding stamp support so that import rules can set the correct label for dep management tools. Currently just added it to kt_jvm_import but I also contributed an alternative backend to bazek_deps https://github.com/johnynek/bazel-deps/blob/master/src/scala/com/github/johnynek/bazel_deps/templates/jar_artifact_backend.bzl and I would update that as well. Two issues I am seeing with this approach (this is still the perscribed approach right ?):
1.
Stamping can only occur in specific packages (same repo , or somewhere in the package path). So it's not possible to kt_jvm_import some jar in //third_party/BUILD. See error below. This might be problematic. I just have to update bazel deps to support kt_jvm_import and perhaps make kt_jvm_library work without srcs.
Sounds like you're creating the artifact in the wrong directory?
1. 2.
JVM languages that cannot use stamp_ijar have to use stamp_jar ? This creates a full copy of the source jar. Kotlin ijar support will likely begin with source -> ijar and not bytecode -> ijar so any kt_jvm_import jar will get copied since we can't ijarify it. This is a lot of copying just to associate a single string with the jar.
You have to stamp the jar, it does create a full copy of the jar.
You can also update the tool to stamp the jar when it creates it.
Or you can eschew add_dep support and not stamp your jars.
1.
error for point 1.
Output artifact 'external/io_bazel_rules_kotlin_org_jetbrains_kotlinx_kotlinx_coroutines_core/jar/io_bazel_rules_kotlin_org_jetbrains_kotlinx_kotlinx_coroutines_core-stamped.jar' not under package directory 'third_party' for target '//third_party:kotlinx_coroutines'
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/4584#issuecomment-413606724, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbnSkkqEXYbRnjFhUwS1e5-_UPFdcGyks5uRZ7LgaJpZM4R5xhL .
Ok so i have tried a few different approaches. Pulling up the transitive depset or kt providers in the rules and then just creating a jar to label map for Just the kt_jvm_import rules seems to be a good solution till ijars are around.
I need a solution that ensures the kt_jvm_imports
have "Target-Label" set in the manifest for strict deps support. (I haven't checked if the scala jars from scala_import
have these set either).
The solution in my last entry just doesn't seem right.
Sounds like you're creating the artifact in the wrong directory?
@tomlu In this case I am using kt_jvm_import
on a filegroup created by bazel-deps
I have to do it in a separate directory. Adding proper Kotlin support to bazel-deps
would be one way to get past this but this is going to be a lot of work I can't get into.
If only a builder process / operation could query the Bazel process, One thing it could query would be the File.owner for a jar path. This would still be hermetic.
This could be done with A simple TCP based proto service (not GRPC because of the dependency footprint). The endpoint would be placed in an environment variable.
Could also be plain HTTP with proto and json req/rep. JDK 9 + has a decent client built in now.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage
) if you think this issue is still relevant or you are interested in getting the issue resolved.
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage
). Thanks!
Description of the problem / feature request:
Strict deps labels are incorrect when using scala_import I suspect the problem is in the java rules and not scala_import since java_rules seems to perform their own heuristics which breaks if a different "actor" is in play and scala_import here is accidental. I might be incorrect in my analysis but a few months ago when I debugged it this is what I saw (don't have links to bazel code now, sorry :( )
Feature requests: what underlying problem are you trying to solve with this feature?
Trying to use java_library or java_common.compile with strict deps warnings
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
clone
https://github.com/ittaiz/bazel-strict-deps-labels-issue
runbazel build --strict_java_deps=error //...
see:expected at the very least
@com_google_guava_guava//jar
and actually I'd like it to be//:import
since the rule effectively exports that fileWhat operating system are you running Bazel on?
os x
What's the output of
bazel info release
?release 0.8.1-homebrew