Closed cgruber closed 3 years ago
Note, this is very urgent for Square, as we definitely use the internal mechanism, and some of how gradle handles it buy default (all things within a gradle module sharing freindship/kotlin-module) blocks our migration if we don't have at least the friend-tree option.
cc: @inez, @rogerhu
Also, I have a PR (cgruber/rules_kotlin#37) pending which implements the friend-tree option. I'd very much prefer to move that over here and just get it into the main repo if we can come to general agreement on approach quickly enough. I'm happy to adapt it to the friend-graph approach if it makes more sense.
I republished cgruber/rules_kotlin#37 as #216 on this repo, mostly for illustration purposes (though if we choose friend-tree approach here, it should work as an implementation, or a base for the other).
Also love to get @kevin1e100's thoughts.
@cgruber here's @kevin1e100's doc. https://docs.google.com/document/d/1TjS5jwWpMQLaPm90m1IeX7anyJL0eKzkmhyI-05bOM4/edit#
Commented. Possibly a bit verbosely. :) Hopefully we can rationalize the two visions.
I'd also love to hear more community comment. :/
Friendship is (presently) not transitive
To me this sounds like by design and is what I'd actually expect. Personally, I'd also align this with how the next Kotlin Gradle plugin will work. I can try to get more info on that.
I pinged @h0tk3y in the Kotlin Slack channel: https://kotlinlang.slack.com/archives/C0B8PUJGZ/p1572377672003300?thread_ts=1555962527.006200&cid=C0B8PUJGZ
I contributed the (ok)buck support for #1, that's fairly straightforward.
For #2 and #3, I think there's a lack of precedent on that front in kotlin's own first party tooling. By extension, you risk forking from whatever direction they take (what if multiple friend paths is for multiplatform rather than the use cases described here? etc etc)
@vRallev:
Friendship is (presently) not transitive
To me this sounds like by design and is what I'd actually expect. Personally, I'd also align this with how the next Kotlin Gradle plugin will work. I can try to get more info on that.
Hmm. Interesting. Non-transitive would be fine, but the transitivity is part of how I organize the logical module concept here, which constrains the friendship network to all being within the same sub-graph of the deps, and sharing one and only one root. If there's no transitivity, then that would be harder... well, maybe it is the case that we could handle it similar to "strict deps" just with a "strict friends" - so the friendship graph (logical module) is carried in the provider, but only directly declared friendships are included in the -Xfriends-path
flag.
@ZacSweers
I contributed the (ok)buck support for
#1
, that's fairly straightforward.For
#2
and#3
, I think there's a lack of precedent on that front in kotlin's own first party tooling. By extension, you risk forking from whatever direction they take (what if multiple friend paths is for multiplatform rather than the use cases described here? etc etc)
Unfortunately, the okbuck code was only straightforward because okbuck generated one build file for the whole gradle project, whereas we have multiple bazel packages per gradle modules, especially in the test code. So the feature has to be a bit more supple than it was requried to be in okbuck. (We totally were inspired by your okbuck work, btw, kudos!)
I think #2
and #3
are maybe less straightforward, but they are different application of an un-named #4
use case, which is:
That's probably a better use-case and it IS supported in gradle, today. But by accident, because of the way gradle chunks up the work, it ends up being in one compilation unit. But bazel is more granular, so achieving that behavior within a different build tool whose "unit of account" is different (target inside a package vs. module as the locus of the compilation unit) the constraints have to be different, to solve the same use-cases.
This is my comment on the google doc too - the very granularity difference is a big part of why we can't just replicate the gradle behavior. Achieving the same behavior as gradle (especially in a migrated project) may require a different application of -Xfriends-path
because the compilation units may be different, if one isn't using the approach OkBuck took, which we are not (inside square).
@vRallev - I added the "direct friends only" option as a variant of the "friends-graph" approach.
For reference in this discussion, there's this: https://youtrack.jetbrains.com/issue/KT-17630
It's not 100% on point, but it's an example where you end up, even in gradle-land, having an issue around how to propagate what's in and out of the module (in this case, the gradle integration's assumptions didn't cover instrumentation tests pulling from a .jar rather than the internal classes dir). This highlights that the implementation and design of hte build tool itself are critical to defining what a module is, so we can't take all the cues from the first-party gradle support, because it's making opinionated choices about gradle, not about kotlin.
The logical definition of a "module" from the kotlin perspective has got to be something defined by the build language, and in the case of bazel, the unit of account is too fine-grained to usefully be that definition. A sub-graph has to be the logical module, else the feature doesn't make any sense (in a bazel context at least). It can't (and shouldn't) be the WHOLE graph or the feature also doesn't make sense. But because bazel targets can be organized in an arbitrary set of shapes, within constraints, what constitutes a "module" for the purposes of internal
visibility is equally more arbitrary in bazel-style tooling.
If you can solve a problem by adding more flavors in the gradle build, then you solve the equivalent problem by adding more graph in a bazel build. So being able to define that sub-graph that is equivalent to "in the gradle module" is key, or the feature simply can't work right for bazel (as I see it)
I got an answer from JetBrains. The friendPaths
is transitive, meaning in a dependency chain of A -> B -> C
, if C is a friend of B and B is a friend of A, then C is a friend of A. It sounds to me that the friendship graph
proposal is the closest solution to this?
Note that, at this point, there's an implementation detail in the compiler (at least the Kotlin/JVM compiler) which requires that a module that is compiled against another module as a friend uses the same module name.
This is caused by the fact that the module names affect the mangled names for the internal declarations: they look like myInternalFunction$my_module_name
. So now, if a module has a different module name than its friends, the mangled internal names at the calls to the friends' internals won't match those in the friend modules, and the calls won't link at runtime, producing a NoSuchMethodError
.
This also has an implication that interferes with merging outputs of associated compilations (speaking in terms of the Kotlin Gradle plugin), such as packing them into a single JAR. Namely, they will all contain a META-INF/<module-name>.kotlin_module
file, and those would clash.
@h0tk3y - the current design preserves that, so if two things have two different modules, they are, by definition, not "friends". And this limits the proposal such that if you make two targets friends, then they share a module name (and the friend-graph approach leans on this)
@cgruber Yeah, that's fine, then. Note that, in the Kotlin Gradle plugin, this limitation will likely be lifted at some point (not before the compiler learns to compile internal calls to a different module), as having different module names for such compilations is quite reasonable.
As to this part of the proposal:
That is, more specifically, since the module definition is provide from a target that did not itself declare any friends, any node in the graph could only depend on nodes that were not part of a different graph of friends. Restated, a node could call another node friend if that node declared no friends, or its friends were all friends (transitively) with the same friend-root.
In the Kotlin Gradle plugin, it is not enforced this way between associate compilations. Instead, associate compilations form a disjoint set union, and adding an associateTo
-edge between any two compilations unites their sets in the DSU.
Not sure which approach is better for preserving sane friendship graphs, though.
Multiple Jars compiled with the same module name normally seem to "clash" if they're both put on the classpath of the compilation, since the .module_info file from the Jar listed first "wins" over the second one. (That file seems to allow the compiler to resolve top-level methods, among other things, it seems?) Does the compiler avoid this problem somehow when -Xfriend-paths is used, e.g., by merging the first Jar's module_info into the second? Note that even if this is handled correctly, downstream compilations would have to enforce a particular order on how these Jars appear on the classpath. This problem doesn't appear in Gradle, I'm guessing, since friends are only used for tests, but if friends are used for arbitrary compilations then it seems like downstream compilations could see surprising errors?
The .module_info
files only clash in file systems (e.g. when outputs of multiple compilations are merged in a directory or in a JAR). The compiler is absolutely fine with multiple classpath items containing the same META-INF/foo.kotlin_module
entry inside.
I'd like to start coming to a conclusion on this.
After some consultation with @restingbull (Corbin Smith) and @shs96c (Simon Stewart, in reference to the java_module rules) and a few others, what seems to emerge as a consensus is essentially the strict-variant of the friend-graph.
Framed another way, but semantically identical, it would work this way:
This approach:
Remaining open questions I can see:
The implementation I wrote is pretty trivially alterable to the above semantic. I could probably have an implementation worked up pretty soon, if we have reached consensus.
As to some of the issues @kevin1e100 raised in the Google doc @jin linked earlier in this thread, I think it's quite reasonable for google to internally constrain things more, if they want. I think this semantic closely matches Jetbrains' intentions around this, but the implementation could be narrowed either by macros or in google's internal rules, if that makes sense. I'm pretty concerned that internal google constraints (which rely on monorepo assumptions which don't hold in the rest of the world) don't overly constrain our solution here.
Hopefully we can come to a meeting of the minds here.
I would prefer to consider friends
as a kind of dependency.
I feel that having to place the label in both deps
and friends
could be potentially surprising. If a label exists in friends
and not deps
, what would it mean?
I would have imagined that was an error if we decide to have them in both places. But the same issue arises in reverse if you place them in friends-instead-of-deps, because now what does it mean if you place them in both? I don't think we want to be in a place where you can put it in friends and/or friends+deps and have both of those work. It's "too many ways to do a thing" so to speak. So whichever way we decide, we should make sure we allow only that way.
That said, the more I think about it, the main reason I think I'm unhappy with friends=="a dep" automatically is naming. There's deps, runtime_deps, but friends is a kotlin concept, and it's not mapped inherently to bazel. IIRC you have to have a friend's jar both on the classpath and in the -Xfriends-path
. I think if we had called it friend_deps (or friendly_deps, or something_deps) I would have felt less weird about deps+friends. Not sure how to resolve that feeling, because we've used friends= for a while, and that's the comments I got on my other PR (where I used single-friend, and therefore renamed to friend=), that we should stick with the existing attribute name.
I also wonder how having friends-be-deps will affect other tools, like unused_deps and the like. But then, it's probably the case that we'll have to alter those tools to be kotlin aware anyway.
If we drop the friend
concept, it becomes privileged_deps
, or module_deps
.
To quote: "APIs should be easy to use and hard to misuse." -- which I take to mean a user should have as few opportunities to receive an error as possible. The case of friends
and no deps
not having a semantic meaning, feels like a bad api.
As for tooling, if you aren't consuming providers or doing queries, (and therefore relying on attribute names) there are many, many, many cases that will bite you. runtime_deps, neverlink, j2objc_library, resource_jars to name a few. I don't believe tooling should be a concern for this, unless we can drop the rdeps call to find out who friends a library.
Just to throw another approach into this... I thought of a way to implement the c++ style friend declaration.
Given a implementation in //module_a:BUILD
kt_jvm_library(
name = "foo",
module_name = "example.bar"
deps = [], # some deps here
)
And a privileged library in //module_b:BUILD
:
kt_jvm_library(
name = "friend_of_foo",
module_name = "example.bar"
deps = [
"//example:foo"
],
)
This could be implemented like this:
kt_jvm_library(
name = "foo",
module_name = "example.bar"
privileged = [
"//module_b:friend_of_foo",
],
deps = [], # some deps here
)
With no changes to the downstream library.
The trick is to do this:
kt_jvm_library = rule(
...etc...
attrs = {
"privileged": attr.string_list(doc="targets that have internal access"),
}
)
And export on the KtInfo
:
KtInfo = provider(
fields = {
...etc...
"privileged": "a list of labels that have privileged access"
...etc...
}
By not resolving the label, and doing a simple string check in the rule when friend library is compiled, it sidesteps the issue of circular dependencies. It allows more chances of typos in the core library definition, but it centralizes the privileged information information.
We could also go whole Java on it -- by tying the the internal
to module name, we could simply declare any library that can access the library (e.g. limit it by visibility) with the module name to be a friend.
The string list is a nice trick. I'd be curious on your thoughts on instead using something like "friend_visibility" as mentioned in this doc; that is to say, use visibility
-like package group specs, which are a bit more stable when targets are renamed.
I feel it's a feature that "friend" deps are marked special in the target that has the privileged access, so I'm not sure getting rid of "friends" attribute is really an advantage. By forcing all friends to be deps (which could be done automatically in a macro), concerns around breaking tools/rules/aspects that hard-code attribute names should also be minimized, unless I'm missing something?
Aside from that, I'm still really unclear whether any of this will be "safe" in general. For instance, in a diamond dependency situation where b and c separately are friends of d, and a (transitively) depends on both b and c (but a is not a friend of either b or c), will that in general work fine? What about a being a friend of b but not of c, is that sensible? I'd feel a lot better about all this if module names of friends didn't have to match.
Re this comment, I'm still really unclear why friend relationships crossing workspaces should be a thing. As was mentioned I don't have a lot of investment in this question, but nonetheless I find it very weird to allow that, at least without something like friend_visibility
.
The cross-workspace use-case is for (frequent in open-source) cases where you have multiple artifacts that are separately deployed to maven, but which constitute a logical module (for internal purposes).
Part of the disconnect is that some organizations aren't in a monorepo at all, but in a poly-repo, where each "project" is pulled in as an external workspace. google monorepo assumptions, while a key driver for bazel, are not valid in the outside world. Just generally, the idea of a workspace as a boundary has different implications in different deployment situations.
As to the "Safety" - the core feature is simply not safe. It wasn't designed to be, precisely. The thing that makes it "safe-ish" is simply that we constrain all the internal things to one logical module (constrained by module name, in practice).
One way to think about this is this: are they all in the same module? If so, then they are permitted to see the internals (if they are direct dependants). They may (with Corbin's strict suggestion) not be able to unless they're directly listed as friends, but they are logically permitted. You activate that by declaring them friends. You don't HAVE to declare friends if you don't care to see internal members. They are still conceptually friends, it's just not active. I don't see an issue with the diamond, especially if we aren't using transitive friendship, which I think we're moving away from (that is, we're trending towards "declare what you use".
I agree with the concern about breaking scripts that look at deps if we have friends automatically be deps (because the friends/privileged deps won't be listed).
I'm still internalizing Corbin's idea of the c++ style. I think that doesn't track, insofar as it's the inverse of how the -Xfriends-path flag works - We'd have to be passing around all of the privilege lists so that at the compilation point, where you have to tell the compiler "what internal members can I see" (the meaning of that flag), then you would have to walk the deps tree to validate whether this target under compilation is in the privilege list of any targets. I guess that's not so bad, but it also will make our migration tooling kind of insane to write, and I'm pretty anxious about that.
My sense is that it ISN'T like bazel visibility. You can say who you are friends with. It's not a security feature, it's a code organization feature, and I"m a little nervous about re-conceiving it in the bazel world. :/
We could also go whole Java on it -- by tying the the internal to module name, we could simply declare any library that can access the library (e.g. limit it by visibility) with the module name to be a friend.
The problem here is to determine how to decide what the module is. We'd either have to have each target declare its module, or infer it. And it is by way of chains/networks of friends= that we (at least at present, and in my proposals) infer that.
As to privileged_deps - the term is wrong for this, I think - it's not that the deps are privileged - friends is an escalation of one's own privilege. module_deps makes more sense to me, because it doesn't imply the inverse relationship. But friends is the term of art in the kotlinc, and I think we should where possible keep to kotlin terminology. friend_deps seems worthwhile to me.
I'd be curious on your thoughts on instead using something like "friend_visibility" as mentioned in this doc; that is to say, use visibility-like package group specs, which are a bit more stable when targets are renamed.
I like it -- but it's a pretty heavyweight solution. On the other hand, if we move the module name off the targets and define it separately, it would offer enough benefit. The downside is that existing libraries would, effectively, not be in a module. Or just in a "closed" module. Implementation is a bit tricky, as noted. Pulling the sources in all the rules that depend on package rule is almost impossible in Starlark. The upside is solid validation and a very clear graph of who has internal access.
I agree with the concern about breaking scripts that look at deps if we have friends automatically be deps (because the friends/privileged deps won't be listed). As noted, if a script is looking only at "deps" it will break. It's simply a matter of time. A safe script would use query or walk the attributes looking for Targets and pulling providers.
I'm still unconvinced that an attribute that can be misconfigured (e.g. friends without deps is an error) is good practice. It clutters up the concept with no pay off -- while you can work around it with a macro, why? Take a step back and consider the end user: they really don't care that kotlinc takes friends as a separate argument. It's just one more hurdle that has to be coded around, and another annoying error when they simply want to get things done. Then, they write a macro. And add more things to the macro. Eventually a macro ecosystem grows around the rule that requires maintenance and documentation. We've seen this. And Starlark doesn't offer good tooling for refactoring. (Yet, but it is already a huge problem for repository management.)
We'd have to be passing around all of the privilege lists so that at the compilation point, where you have to tell the compiler "what internal members can I see" (the meaning of that flag), then you would have to walk the deps tree to validate whether this target under compilation is in the privilege list of any targets.
There is no walking the validation tree. When the compilation action happens, it's simply a matter of collecting the friend allowed sets of the direct dependencies and passing them to the friends flag. If the current library is in that set, allow internal access. The downside to this is simply the fact that typos are hard to surface without a full graph analysis -- you could offer internal access to //moon:crater and have everything work for the most part. Until //moon:critter tries to access internals.
The problem here is to determine how to decide what the module is. We'd either have to have each target declare its module, or infer it. And it is by way of chains/networks of friends= that we (at least at present, and in my proposals) infer that.
Yeah, it would require all targets to either declare the same module, infer it from the target location, or do something untoward with package
. Much like java, it's magical.
But friends is the term of art in the kotlinc, and I think we should where possible keep to kotlin terminology. friend_deps seems worthwhile to me.
Disagree. Kotlin uses the term internal
, and allows internal
access if a series of files are compiled together (e.g. module refers more to the Gradle concept than the language concept. There is no limit based on package name). While kotlinc uses friends, the Gradle feature is associated_to
. Friends is, as I read it, it not a Kotlin concept but a kotlinc concept. Therefore, using friends is leaking a compiler abstraction to the language. I'd like to avoid that. Kotlin has enough terminology without adding to it.
Quick update to this comment: it appears friends can have different module names now; when calling internal members, the generated bytecode uses the correctly mangled method name for the target module AFAICT. @h0tk3y does that sound right?
Any update on the status of this issue? We'd love to be able to at least use friends
with testonly
libraries.
To anyone who's watching, #465 takes this issue's approach (but renamed to "associates") and opens it up to any kt_jvm or kt_android rule type.
Specifically, associates= are transitive (within the subgraph of associated targets), and confer the ability to see associated targets' internal
members. These little associated subgraphs are a dag, with one top-level node from which all other nodes inherit a module name. This means that one can only associate with multiple targets which all share that subgraph's root (are associated together), and if you try to associate to separate associated subgraphs that is an error. (This is, presently, handled through the signal of the module name.
The associate-graph transitivity should be rendered moot once strict-deps are in kotlin, which means we aren't going to separately manage "strictness" of the associates= attribute.
kt_jvm_import does not have associates= yet. It can be the root of an associated module graph but can't participate in one. This is not in-scope since (a) you're not compiling them and (b) if you need to effectively get this, you can just set their module_names the same. If the ergonomics of that are frustrating, then we can take it up as a separate issue.
An associated target should only be listed in associates=
not in the deps. It will be included as a dep, but dual-specification is messy. It otherwise works fairly closely to what was described in this issue.
Problem
Kotlinc supports a visibility
internal
which implies that all code with access tointernal
members are part of the same logical module. Without modification,internal
is granted to all code contained in the compilation unit. However, there are several use-cases where code outside a compilation unit is to be appropriately granted access. Examples of this are:internal
members of the code under testinternal
conveniences of public API which has been separated into a different compilation unit for build-system constraint (such as bazel'svisibility=
mechanism.internal
implementation code such as internal constructors, etc., in order to create or otherwise configure them.kotlinc supports these by providing a
-Xfriend-paths
flag, whose content is a classpath (.jars and/or class-folders) whose presence in the list is given as a signal that *kotlinc should grant access to theirinternal
members during this compilations.rules_kotlin presently supports one specific and narrow use of
-Xfriend-paths
.kt_jvm_test()
users can declare afriends=
attribute pointing to a target containing the objects under test, and this will give the test access. A libraryB
which declaresA
asfriends=
shares its module definition (specifically,KtJvmInfo
provider takesA
'smodule
definition and uses it instead of generating its own, and usingkotlin_module=
andfriends=
attributes together is forbidden.Also, any target listed as
friends=
is added to the direct dependencies of that target.There are several problems with the present implementation, however, including the following:
friends=
mechanism is named as plural, but permits only a single friend target.C--friends-->B--friends-->A
will result inC
seeinginternal
inB
, andB
(during compilation) seeinginternal
inA
, butC
cannot seeinternal
inA
. This prevents having, say, three tests, a shared test lib withinternal
and code under test containinginternal
and all working together.-Xfriend-paths
Proposal
This issue proposes a reworking of the mechanism with one of two options on contraints.
To broaden the mechanism, we should move the
friends=
attribute from the test rule to the general common attributes, so that allkt_jvm_*
rules can make use of it. Further, rather than passing in existing classpaths (compilation classpath, the direct classpath, the transitive one, etc.) the friendship should be a depset passed along on theKtJvmInfo
provider, which can build a transitive closure of friendship separate from the compilation or general dependencies.The same behavior around
kotlin_module
attribute (and the metadata in the provider) would be preserved as it currently is. Declaringkotlin_module
andfriends
would be prohibited in the same rule definition.Without constraints beyond the module limitation, this would still be too broad, and possibly result in an arbitrary explosion of permissive access to
internals
. To limit this, one of two strategies should be chosen.Constraint options
Single-friend
friendship tree
In this variant, any
kt_jvm_*
target (binary, library, test) should be able to declare only a singlefriend=
. (To avoid ambiguity, the attribute should be named in the singular - the existingfriends=
on the test rule would be given a warning, since that is its present constraint anyway). The friendship would be transitive all the way up to the first dependency that does not declare friendship. Anything that declares itself within this friendship tree would be considered within the same module, obtain the same module name, see internals in any of its upstream that was part of the friendship, etc.One could therefore have the following friends groupings:
In the above system,
//src/blah/foo:foo
would define the root or hub of the friendship, and all the other targets would share its module definition (src_blah_foo_foo, by default)This would be sufficient to cover most approaches to the above listed use-cases. Though one could construct some counter-examples, these could be nearly always restructured into a tree of friendship.
example syntax
Multiple-friends constrained to module-boundary
friendship graph
In this variant, a rule invocation could declare any number of
friends=
targets, however, they would all share a module definition, and would have to form a directed graph, with a single ultimate parent node whose module defined the whole friendship graph. That is, more specifically, since the module definition is provide from a target that did not itself declare anyfriends
, any node in the graph could only depend on nodes that were not part of a different graph of friends. Restated, a node could call another node friend if that node declared no friends, or its friends were all friends (transitively) with the same friend-root.One could therefore have the following friends groupings:
In the above, the main distinction is that there exists a
foo-debug-wiring
which uses some runtime conditional logic to choose betweenfake
andimpl
wiring choices. An example of this situation would be an android app intended for debug use as well as running against instrumentation tests. With this setup, one wouldn't have to contort the system into a tree in order to declare friendship with different "descendants" offoo
.This could allow for more narrowly specified direct deps, which can have some performance improvements on build graphs, at scale.
example syntax
Multiple-friends constrained to module-boundary (direct friends only)
intransitive friendship graph
This option would also work very much like the friend graph, only the actual classpath sent to
-Xfriends-path
would not be the transitive friend graph, but only those directly specified friends. The same constraints would, however, apply, in that one could only friend targets in the same logical kotlin module, and that would be defined by roots that themselves don't declare friendship.Notes and open questions
Going with friend-trees is slightly limiting vs. the friend-graphs approach. However, loosening is easier than tightening the constraint later. Generally the author of this issue prefers going with the friend-tree approach until solid use-cases which it cannot satisfy are brought forward.
One question that should be asked is if "friend" is the naming we want. By default, I'd argue yes, because that's the language of kotlinc however it is not terribly descriptive. Another option would be
internal_access=
. One downside of that is that the plural and singular would be the same, which would make changing constraint options more difficult in the future. However, if the friend-graph option were selected in the first place, this wouldn't be an issue, since reducing from graph to tree is already hard.