bazelbuild / rules_scala

Scala rules for Bazel
Apache License 2.0
364 stars 275 forks source link

Strict scala deps and unused_deps #235

Open ittaiz opened 7 years ago

ittaiz commented 7 years ago

So, I met with @cgrushko a few days ago and apparently the way we (rules_scala) currently do strict deps (allowing a target to only use what it directly depends on) is very different from the way java rules do it and IMHO not as good.

Problem: We currently pass to scalac only the direct jars and let scalac emit an error to the user. These errors are quite often cryptic and unuseful for developers to be able to easily fix the problem. Moreover because of how the scala compiler works there are many cases where it needs transitive dependencies even if they aren't mentioned in the source code of the unit we're compiling[1]. This leads to having to add many transitive targets without any reasoning just to appease the compiler or worse to start using exports loosely around the codebase.

We suffered from the above transitive dependencies so much that we're just adding the whole transitive closure to the deps of a target. This is in the onboarding of a project so we're ok with it but this solution can help greatly minimize it.

Proposed solution: Pass direct and indirect jars to scalac's classpath (like java compilation does).
Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

We will also support a lenient mode where this error is outputted as a warning. add_dep is a command which buildozer can take and apply it to your BUILD file. I think (not sure) that ibazel can pick up these warnings and automatically apply them to your BUILD file. Probably rebazel will be able to do this as well if they'll want to.

Value: Separation of concerns (scalac won't enforce graph concerns) Allows higher level messages and so call to actions Might allow finer grain modeling of when indirect dependencies are ok (for an example where scalac "messes" up but we might be able to improve see this repo)
Better cycle for developers since they can run with lenient mode and iterate quickly and fix the deps before pushing (or use ibazel/rebazel). The caveat is obviously the fact people can have a local green build which fails on CI.

Open questions: It's unclear how smart the compiler plugin can be given scalac's limitations and so unclear on how much noise will still be added to a target's deps.

Small implementation detail: To do this we'll need to pass to the compiler plugin a list of direct dependencies and a list of indirect dependencies where at least the latter needs to have both jars and the targets so that the error/warning message can be useful (add //src/main/foo:baz to your deps)

[1]Improve user experience for builds that use only direct dependencies

ittaiz commented 7 years ago

cc some people who are/were involved with thinking about this at Wix (@ors @natans @nadavwe @shvar @dkomanov)

johnynek commented 7 years ago

this seems really nice.

note related post on bazel blog for java https://blog.bazel.build/2017/06/28/sjd-unused_deps.html

The strategy here looks good to me. Some comments:

  1. can we get the reliability so this is the default, or will we need to be opt in for a while (I imagine the latter).
  2. should we have a version that just warns on unused dependencies first (kind of the reverse problem as this).
  3. I'm surprised the strict dependencies became a show-stopper for you. Our dependencies are probably over-broad, but it hasn't been debilitating.
ittaiz commented 7 years ago

The blog post and the issue are both products of Carmi and me meeting a few days ago :) Some of the wording in the issue are copied from the explanation of how the plugin works.

Re #1 I think having "dumb" strict mode via Bazel and not scalac should be easy. By "dumb" I mean we warn/error for every indirection. What might be more complicated is the "smarter" mode where we are able to identify which indirect usages are legitimate (compiler's limitations) and which aren't (you're just using that dep yourself)

Re #2 honestly that's not the problem I need to solve now. Many of our generated BUILD files have ~100 dependencies in their "deps". If I get strict mode with warnings then I'm able to add that into the migration phase and output minimal BUILD files. Unused is for longer down the road where you want to keep BUILD files cleaner and IINM the current convention in Google is to move the unused dependencies to runtime under the assumption something might need it.

Re #3 after I added around 30 transitive targets without understanding why the compiler is angry I decided to automate it. Note that it wasn't all of them. For this one repo I'd guess we're talking about several hundreds of noise dependencies. On Thu, 29 Jun 2017 at 5:06 P. Oscar Boykin notifications@github.com wrote:

this seems really nice.

note related post on bazel blog for java https://blog.bazel.build/2017/06/28/sjd-unused_deps.html

The strategy here looks good to me. Some comments:

  1. can we get the reliability so this is the default, or will we need to be opt in for a while (I imagine the latter).
  2. should we have a version that just warns on unused dependencies first (kind of the reverse problem as this).
  3. I'm surprised the strict dependencies became a show-stopper for you. Our dependencies are probably over-broad, but it hasn't been debilitating.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-311841849, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF0si-kg_1lWXezRPcevNm5oLGICOks5sIwaJgaJpZM4OHkDK .

ittaiz commented 7 years ago

@gkossakowski Since Oscar pulled you to the PR I thought I'd pull you to the issue and ask you if you have an intiuition or an idea on how "smart" will the plugin be able to be? What we really want to enable is to warn/error the user when their code uses a transitive dependency. This is to allow for upstream targets freedom in changing implementation decisions. I don't know how feasible this distinction (I use vs compiler needs for transitive reasons) is in the compiler/compiler-plugins.

Thanks!

ittaiz commented 7 years ago

@gkossakowski Do you have an intuition or an idea on how we can advance the "smart" plugin mode? What we really want to enable is to warn/error the user when their code uses a transitive dependency as opposed to warning the user for cases where the compiler needs this symbol lookup just for the its own needs. This is to allow for upstream targets freedom in changing implementation decisions.

Thanks!

gkossakowski commented 7 years ago

Yes, I mulled over this problem in the context of zinc's incremental compilation decision. In zinc, you want to approximate the dependencies scalac would ever need.

The short answer is: the dependencies introduced by reference do not require a transitive closure but the one introduced by inheritance does require a transitive closure. When a program mentions a symbol, e.g. val x: Foo then it introduces a reference dependency. When program mentions a dependency in the context of declaring a parent type for a class, e.g class X extends Y[Foo], then it introduces a dependency by inheritance.

You can capture both kinds of dependencies by traversing the trees and collecting the information in the right context. Zinc implements this: https://github.com/sbt/zinc/blob/1.x/internal/compiler-bridge/src/main/scala/xsbt/Dependency.scala#L326

ittaiz commented 7 years ago

Are you sure? I have an example which has a class A with a member B and B has a member C (which is a java interface) and you can't compile A without C. Tested with 2.11.7 and 2.12.1 at the time

ittaiz commented 7 years ago

Actually I think that my example above shows that there are more areas the plugin can improve in but what you wrote IIUC is that at least in that aspect (of inheritance) the plugin can be smarter and not warn/error. Did I understand you correctly?

gkossakowski commented 7 years ago

My comment is what should happen given Scala's typesystem design to shed light what directionally to expect. From there you can derive whether the behavior you're observing is a bug or a design choice you have to work with.

I just checked that what I said about dependencies by reference not forcing transitive dependencies is true for pure Scala:

[gkk@mbp ~/tmp/scala-tmp]$ echo "class A" > A.scala
[gkk@mbp ~/tmp/scala-tmp]$ echo "class B {
>     def a: A = new A
> }
> " > B.scala
[gkk@mbp ~/tmp/scala-tmp]$ echo "class C {
>     def b: B = new B
> }
> " > C.scala
[gkk@mbp ~/tmp/scala-tmp]$ mkdir outA outB outC
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outA A.scala
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outB -cp outA B.scala
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outC C.scala # skip -cp to show that classpaths are isolated
C.scala:2: error: not found: type B
    def b: B = new B
           ^
C.scala:2: error: not found: type B
    def b: B = new B
                   ^
two errors found
[gkk@mbp ~/tmp/scala-tmp]$ scalac -d outC -cp outB C.scala # the A class is not on the classpath

I'm not sure what's different about your example but I think it should work. Perhaps you can check whether you can reproduce the problem in pure Scala setting? I suspect that loading Java-originating symbols from a class file is overeager. It's a different code path than loading Scala symbols in the compiler.

promiseofcake commented 6 years ago

Would the use-case where a scala_test target is required to redeclare the external dependencies of it's scala_library fall into this issue -- or would that be something else?

I have built a contrived example here: https://github.com/promiseofcake/bazel-scala-test/pull/2 that runs with Travis to show the current issue.

I think it belongs here because for our java targets we have not had to perform similar adjustments.

ittaiz commented 6 years ago

Yes. This is under the purview of this issue.

promiseofcake commented 6 years ago

Regarding the newly documented strict-deps,

Note: Currently strict-deps is protected by a feature toggle but we're strongly considering making it the default behavior as java_* rules do.

If it were to become the default behavior, would we want to follow the proposed solution before making it mandated?

ittaiz commented 6 years ago

Hi Lucas, First I'd like to apologize for not responding previously, I somehow missed your reply. When you say "the proposed solution" you mean your proposed solution of "whitelisting" the scalatest/scala library dependencies? If so I agree we should. Can you open an issue about it? If you meant something else I'd appreciate the clarification On Thu, 9 Nov 2017 at 9:37 Lucas Kacher notifications@github.com wrote:

Regarding the newly documented strict-deps,

Note: Currently strict-deps is protected by a feature toggle but we're strongly considering making it the default behavior as java_* rules do.

If it were to become the default behavior, would we want to follow the proposed solution before making it mandated?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-343232004, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIFwUNp2bp4KAQdctKA5os_ivpkzK3ks5s0zhHgaJpZM4OHkDK .

promiseofcake commented 6 years ago

No worries @ittaiz, the proposed solution I was referring to was:

Pass direct and indirect jars to scalac's classpath (like java compilation does). Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

With the gist of it being that dependency resolution could perhaps mimic the way it works for java targets? Here is another example: https://github.com/promiseofcake/pubref-java-proto-scala/pull/1 which shows that exports from java_ targets aren't pulled in for scala_ rules, and I am fairly certain those targets are exported because the --strict_java_deps=ERROR does not error out for the java_ targets.

ittaiz commented 6 years ago

Ok, This proposal is implemented (minus the smart mode which we actually started working on yesterday). It's very possible that we have bugs like mandating the scala test jars or not supporting exports properly. The best solution IMHO is for other organizations to start using it and flush out problems which we'll fix. For each of those we should probably have an issue and a repro and we'll try to iron them out. On Thu, 9 Nov 2017 at 10:05 Lucas Kacher notifications@github.com wrote:

No worries @ittaiz https://github.com/ittaiz, the proposed solution I was referring to was:

Pass direct and indirect jars to scalac's classpath (like java compilation does). Add a scala compiler plugin which essentially will walk the AST after the type annotation phase, will stop at each 'type expression' and check that the jar from which the type was loaded is a direct_dependency. If not, it is an --indirect_dependency, and the plugin generates an error message. The message includes an 'add_dep' command to add the missing direct dep.

With the gist of it being that dependency resolution could perhaps mimic the way it works for java targets? Here is another example: promiseofcake/pubref-java-proto-scala#1 https://github.com/promiseofcake/pubref-java-proto-scala/pull/1 which shows that exports from java targets aren't pulled in for scala rules, and I am fairly certain those targets are exported because the --strict_javadeps=ERROR does not error out for the java targets.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-343240136, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF6bqKfXJADrCimr9DujLxqbXP4PJks5s0z70gaJpZM4OHkDK .

kamalmarhubi commented 6 years ago

Pass direct and indirect jars to scalac's classpath (like java compilation does).

Is this implemented? I'm running at current tip, and am getting errors that seem to indicate that it isn't: I get a term not found error that I can resolve by manually adding transitive dependencies to the target.

ittaiz commented 6 years ago

This is implemented. Are you running the build with --strict_java_deps flag? You can use error or warn as the relevant flags. This isn't the default behavior yet since we're still ironing out some kinks (like warning on scala-library). If you encounter issues I'd really appreciate opening a new issue and linking it to this one so we can have more scoped discussions.

kamalmarhubi commented 6 years ago

Ah, I didn't realize I had to pass that flag. That does indeed fix it, and allows me to remove the extra dependencies (or equivalently the exports on the direct dependencies). Thanks!

(There's some weirdness with the error messages mentioning unknown labels, but I'm pretty sure I saw that in another issue.)

ittaiz commented 6 years ago

Are you using scala_import for your dependencies? Currently the only rules that propagates labels correctly are scalaimport and scala* (library, binary, etc). If for example you have a scala_library (a) which depends on java_library (b) which in turn also depends on java_library (c) and a uses c without declaring it the labels will be off. We use only scala_library and scala_import so this specific thing is less of a blocker for us. I hope to tackle it in the next few months but it's not a blocker to making it the default since we're talking about info you also don't have today

On Sun, Dec 10, 2017 at 11:01 AM Kamal Marhubi notifications@github.com wrote:

Ah, I didn't realize I had to pass that flag. That does indeed fix it, and allows me to remove the extra dependencies (or equivalently the exports on the direct dependencies). Thanks!

(There's some weirdness with the error messages mentioning unknown labels, but I'm pretty sure I saw that in another issue.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-350534394, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIFyiIy5_TmdspvtqnrBH4sHaqsKT_ks5s-53ngaJpZM4OHkDK .

kamalmarhubi commented 6 years ago

@ittaiz

Are you using scala_import for your dependencies?

I'm experimenting with java_import_external from bazel_tools, so no. However it's possible to decipher the message in the meantime.

Thinking on how to solve this: it seems like it should be possible to apply an aspect to the deps of scala_* to collect this info. Is that the approach you were considering?

ittaiz commented 6 years ago

No actually. The current direction (which I Haven't gotten around to but you're welcome to) is to generalize java_import_external to a jvm_import_external which will accept either scala_import (and we will expose here as scala_import_external) and also java_import_external as today. See this discussion for agreement with @dslomov on this direction

jjudd commented 6 years ago

As far as I know, strict mode for rules_scala does not yet catch extra dependencies. @ittaiz discussed writing a compiler plugin to deal with this part.

I ran across this compiler plugin from the Scala Center https://github.com/scalacenter/classpath-shrinker which seems to do what we are interested in doing.

I played around with using the plugin to clean up some BUILD files that were auto generated using sbt -> pom -> Bazel: https://docs.bazel.build/versions/master/migrate-maven.html

I ran into an issue where all of the transitive dependencies are on the classpath at compile time, so if you have scala targets that depend on other scala targets, then the compiler plugin prints a bunch of warnings.

I'm curious if this plugin has been explored before or if any work has been done on detecting extra dependencies.

jjudd commented 6 years ago

Well, I feel silly. I didn't read closely enough. Looks like that plugin is already in rules_scala :D https://github.com/bazelbuild/rules_scala/blob/master/third_party/plugin/src/main/io/bazel/rulesscala/dependencyanalyzer/DependencyAnalyzer.scala

I'm still curious if the original plugin detected extra jars if rules_scala can do that. I'll dive in a bit more. Sorry to waste anyone's time.

jjudd commented 6 years ago

Okay, I quickly hacked together a really rough implementation of unused_deps in #438 that is bundled with strict_java_deps. I would like to get people's feedback before going any further.

Jamie5 commented 6 years ago

Will there be a way in the future to use both unused_deps and strict_deps at once? Currently readme suggests not

johnynek commented 6 years ago

It would be nice... that said, they work across purposes a bit.

At Stripe, we minimize the compiler classpath to reduce cache invalidation. So, we don't use strict_deps (which actually puts all your transitive deps on the classpath, but then errors if you use them). Instead, we just don't have them, and we get errors (although harder to diagnose) if you use them.

Really, I start to wonder if compiler plugins are the right path here. As long as scalac is not trying to do separate compilation, the experience will always be a bit second rate. I'd like to see separate compilation with minimal classpaths be a supported feature of scalac, but right now it is kind of tolerated, but not really strongly supported.

jjudd commented 6 years ago

We've been using both together at Lucid for a few months with Annex and it has worked quite well. Zinc's analysis is used to determine unneeded and unused deps and outputs nothing, warnings, or errors with buildozer commands depending on the configuration. Combined with ibazel's run_output and run_output_interactive, you can get an interactive N/y prompt to execute a buildozer command to remove or add a dependency when it violates strict/unused deps. This has made it pretty easy to maintain minimal classpaths.

ittaiz commented 6 years ago

I hope/plan to implement Twitter's solution which AFAIU goes something like this: Use a minimal classpath (like the default one in rules_scala today). Have something (ibazel?) that sees the compilation failures Knows how to extracts from it the class name (class name might be too simple) Sends it to an index of class names to labels which returns a response If it's one label it adds it to the relevant target (like the buildozer one)

I think the hard parts are the index (we'll probably have that ready in a couple of weeks) and extracting a class name from the cryptic compiler errors.

WDYT?

On Wed, Oct 3, 2018 at 11:00 PM James Judd notifications@github.com wrote:

We've been using both together at Lucid for a few months with Annex and it has worked quite well. Zinc's analysis is used to determine unneeded and unused deps and outputs nothing, warnings, or errors with buildozer commands depending on the configuration. Combined with ibazel's run_output and run_output_interactive, you can get an interactive N/y prompt to execute a buildozer command to remove or add a dependency when it violates strict/unused deps. This has made it pretty easy to maintain minimal classpaths.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-426779756, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF4C646e2OzYLgcXvrLYs6e9PDN-zks5uhRdhgaJpZM4OHkDK .

jjudd commented 6 years ago

There would need to be one index per scala_ target, right? The index would contain the mapping of class to label for the transitive closure of the current target? If it is global, it would be constantly churning and class name collisions could cause problems.

ittaiz commented 6 years ago

I don't think we're talking about the same thing. The index is of class-name to labels that it belongs to (meaning it was in the sources of that target) (multiple labels can declare the same FQN). If the FQN belongs to one label it's easy. Otherwise the user needs to decide.

On Thu, Oct 4, 2018 at 12:07 AM James Judd notifications@github.com wrote:

There would need to be one index per scala_ target, right? The index would contain the mapping of class to label for the transitive closure of the current target? If it is global, it would be constantly churning and class name collisions could cause problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/235#issuecomment-426803519, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF7KQeQimd1PTPKFCcXAj4-Hf9yRdks5uhScTgaJpZM4OHkDK .

Jamie5 commented 5 years ago

So while unused_deps and strict_deps do work a bit at cross purposes it still seems useful to have both. Because without strict_deps, you have to have dependencies in your classpath that don't really make sense (eg types you never use in any way yet a base package class uses). so you may well decide that the tradeoff in increased compile time in some cases is worthwhile. And unused_deps of course, because you don't want to have unnecessary dependencies regardless of what dependency model you are using.