Open johnynek opened 5 years ago
Is this even possible to do correctly in Scala? Scala package paths don't have to match the file path (and a single file can declare multiple packages), so every source file must be "used" to determine if it declares a package that was imported anywhere else.
Does Scala have any kind of "strict packages" mode that behaves more like Java?
we already have the unused dependency checker which we optionally use to error. We can use the same code and just emit the list of unused jars rather than error.
It isn’t about individual files inside a target, it is about the full target output which bazel is tracking. I don’t see why the above won’t work.
I haven’t read the above (and it’s exiciting) but I just need to re-iterate that from experiments I did inside Wix’s codebase the unused dependency checker was way way off often declaring used dependencies as unused.
On Thu, 27 Jun 2019 at 0:14 P. Oscar Boykin notifications@github.com wrote:
we already have the unused dependency checker which we optionally use to error. We can use the same code and just emit the list of unused jars rather than error.
It isn’t about individual files inside a target, it is about the full target output which bazel is tracking. I don’t see why the above won’t work.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/777?email_source=notifications&email_token=AAKQQF5NU4347RZAAGUG2E3P4PL4LA5CNFSM4H3NLIGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYU2Z4I#issuecomment-506047729, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKQQFYIIRAU65TJMURVRQ3P4PL4LANCNFSM4H3NLIGA .
It would be great to reproduce those. We run it turned on in almost all targets but there are a few cases where it is wrong.
With more reproductions of misfiring cases we should be able to fix the issues.
You'll have the same problem at the jar level, since a jar can contain any package. You have to open the jar to know if the package you want is in there. If it was unused before, there is no guarantee it is still unused.
Contrived example ahead: https://gist.github.com/JaredNeil/a41e24ec9f1038140c36c705dae8ab58
bazel build :collision
will warn about :no
being unused, because it is, but the build will pass.
warning: Target '//:no' is specified as a dependency to //:collision but isn't used, please remove it from the deps.
You can use the following buildozer command:
buildozer 'remove deps //:no' //:collision
one warning found
Target //:collision up-to-date:
bazel-bin/collision.jar
Then switch the package in no.scala
from no
to yes
and bazel build :collision
again.
Now you'll get a compile error:
collision.scala:6: error: type mismatch;
found : Int(3)
required: String
def x = 3
^
one error found
one error found
java.lang.RuntimeException: Build failed
at io.bazel.rulesscala.scalac.ScalacProcessor.compileScalaSources(ScalacProcessor.java:244)
at io.bazel.rulesscala.scalac.ScalacProcessor.processRequest(ScalacProcessor.java:69)
at io.bazel.rulesscala.worker.GenericWorker.run(GenericWorker.java:114)
at io.bazel.rulesscala.scalac.ScalaCInvoker.main(ScalaCInvoker.java:41)
Target //:collision failed to build
But if you had produced an "unused files list" that contained no.jar
during the first build, no.jar
would have been ignored, there would be no changes, and the rebuild would have been skipped. Thus, a clean build would have a compile error, but an incremental build would "pass" because it wouldn't run.
Okay. I see your objection. I believe this applies to the feature itself in bazel. Your point is that changing the contents of an unused file can make it either used or an error.
So, it is only a kind of heuristic that works in some common cases.
This seems like a good point. I’m not sure anyone raised it. It certainly applies to Java jars or zip files as well. Maybe C++ uses multiple outputs to side step this.
I guess one approach would be to output a directory of class files, but still your point about generating collisions holds.
For Scala (and Java) the solution is to create a new action to list classpath classes and check for duplicates. This is a fast operation.
Then add that action as a dummy input to the compilation action.
So a situation like you describe @JaredNeil would produce a failure before the compilation step. But a change in an unused class still produces a cache hit for the compilation action.
However, I think something significant hasn't been mentioned.
Anyone using strict deps doesn't have any unused inputs. As is, this would be an alternative to strict deps, not a performance improvement on top of it.
If you wanted to improve performance, ideally, inputs would be need to be more granular than they already are. You can't use classfiles because the inputs/outputs have to be known at analysis time, and I assume there's no special treatment of directories.
You could use heuristics to produce multiple JAR files. Examples:
The key to produce relatively stable relationships between sources and compilation outputs, so that a change to a source file changes a minimal number of compilation outputs (which are future compilation inputs).
The primary obstacle -- I believe -- is actually wildcard imports. They can shadow and produce usage patterns that don't work well with input discovery.
To be correct, you need one of
see this thread: https://groups.google.com/forum/#!topic/bazel-dev/oFRdGdrm8DM
here is the design doc: https://github.com/emmanuel-p/proposals/blob/b207bba34e0d5a47be478087c41396764abdff29/designs/2019-04-04-starlark-dependencies-pruning.md
it is implemented now: https://groups.google.com/d/msg/bazel-dev/oFRdGdrm8DM/k0s1vwvlCAAJ
The idea is simple: produce a list of unused files by an action. The design doc is pretty unclear about the format of the file. I assume one line per file? This commit seems to be confirming that guess: https://github.com/bazelbuild/bazel/commit/57363813d3c72129c61c35a2b03c3033b62cb49d#diff-b604907e2f234f2839600aabf9dc4498R139
Looks like a UTF-8 file, one line per file name, empty lines are ignored, lines are
.trim
ed.