document cases where ijar + scalac is suboptimal

johnynek commented 7 years ago

ijar https://github.com/bazelbuild/bazel/blob/master/third_party/ijar/README.txt is what bazel uses to create "interface jars" from compiled jars. By hashing those interfaces we see if we need to recompile.

The challenge with scala is that scalac adds a ScalaSignature annotation to the class that includes extra information from the scala type system. Any change to that, will change the hash. It may be that not all of them are actually essential to compile potentially for private method, for instance, where other callers can't call the method, we may be able to strip the ScalaSignature.

The -Yskip-inline-info-attribute scalac option greatly improves things since without that almost every change will cause a recomputation. It seems that is a tradeoff that is worth it, but we have not benchmarked to see if that costs performance.

/cc @ianoc

johnynek commented 7 years ago

This could be useful as an example of parsing the scala annotation on each class: https://github.com/avityuk/scala-pickled-visualizer

It is not clear to me if the scala annotation is already minimal or if we could remove data about private methods and prevent changes. Might be interesting to make some test cases, run ijar and the above tool and see if things change when private[this] defs and vals change.

softprops commented 7 years ago

Hi @johnynek just learning about ijars which I think will be essential for avoiding invalidating scalac dependency chains. Running some experiements with scala_libary rules. I can see ijars getting created but it didn't look like they were actually used when I was testing adding/removing private methods to down stream dependencies which shouldn't have an affect on needing to recompile upstream users. Then I found this thread. I tried adding the -Yskip-inline-info-attribute suggestion and it didn't seem to help.

Here's of my experiement

scala_library(
    name = "foo",
    srcs = glob(
      [
        "foo/src/main/scala/**/*.scala",
        "foo/src/main/java/**/*.java"
      ]
    ),
    scalacopts = ["-Yskip-inline-info-attribute"]
)

scala_library(
    name = "bar",
    srcs = glob(
      [
        "bar/src/main/scala/**/*.scala",
        "bar/src/main/java/**/*.java"
      ]
    ),
    deps = [":foo"],
    scalacopts = ["-Yskip-inline-info-attribute"]
)

I'm testing adding and removing private methods in package foo then bazel building bar. I expected that with the ijar machinery in place bar wouldn't require a rebuild since it doesn't/cant reference foo's private parts.

Do you have an examples you can point to help demonstrate how to make this work?

softprops commented 7 years ago

I also tried the private[this] on vals, which i understand forces scalac not to synthesize a method interface for a private field, to no avail.

johnynek commented 7 years ago

I don't know that it will work for that case right now (sadly).

Try changing logic inside of a method and see if it recompiles.

I think the issue is that the scala signature, a serialized attribute on the class, has information about it still.

The other issue might be only private[this] can work.

Like I said, we don't know enough yet. :/ On Sat, Feb 11, 2017 at 11:23 doug tangren notifications@github.com wrote:

Hi @johnynek https://github.com/johnynek just learning about ijars which I think will be essential for avoiding invalidating scalac dependency chains. Running some experiements with scala_libary rules. I can see ijars getting created but it didn't look like they were actually used when I was testing adding/removing private methods to down stream dependencies which shouldn't have an affect on needing to recompile upstream users. Then I found this thread. I tried adding the -Yskip-inline-info-attribute suggestion and it didn't seem to help.

Here's of my experiement

scala_library( name = "foo", srcs = glob( [ "foo/src/main/scala/*/.scala", "foo/src/main/java/*/.java" ] ), scalacopts = ["-Yskip-inline-info-attribute"] )

scala_library( name = "bar", srcs = glob( [ "bar/src/main/scala/*/.scala", "bar/src/main/java/*/.java" ] ), deps = [":foo"], scalacopts = ["-Yskip-inline-info-attribute"] )

I'm testing adding and removing private methods in package foo then bazel building bar. I expected that with the ijar machinery in place bar wouldn't require a rebuild since it doesn't/cant reference foo's private parts.

Do you have an examples you can point to help demonstrate how to make this work?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279176703, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdqBgv4VOjafH9F74t_xGgNh5Ox6fks5rbia5gaJpZM4Lnafl .

johnynek commented 7 years ago

We may need to parse and rewrite the scalasignature to only include non private information. I've looked at that a bit, but I have not found such a clear explanation of how it is serialized, nor what might be safe to remove. On Sat, Feb 11, 2017 at 11:25 P. Oscar Boykin oscar.boykin@gmail.com wrote:

I don't know that it will work for that case right now (sadly).

Try changing logic inside of a method and see if it recompiles.

I think the issue is that the scala signature, a serialized attribute on the class, has information about it still.

The other issue might be only private[this] can work.

Like I said, we don't know enough yet. :/ On Sat, Feb 11, 2017 at 11:23 doug tangren notifications@github.com wrote:

Hi @johnynek https://github.com/johnynek just learning about ijars which I think will be essential for avoiding invalidating scalac dependency chains. Running some experiements with scala_libary rules. I can see ijars getting created but it didn't look like they were actually used when I was testing adding/removing private methods to down stream dependencies which shouldn't have an affect on needing to recompile upstream users. Then I found this thread. I tried adding the -Yskip-inline-info-attribute suggestion and it didn't seem to help.

Here's of my experiement

scala_library( name = "foo", srcs = glob( [ "foo/src/main/scala/*/.scala", "foo/src/main/java/*/.java" ] ), scalacopts = ["-Yskip-inline-info-attribute"] )

scala_library( name = "bar", srcs = glob( [ "bar/src/main/scala/*/.scala", "bar/src/main/java/*/.java" ] ), deps = [":foo"], scalacopts = ["-Yskip-inline-info-attribute"] )

I'm testing adding and removing private methods in package foo then bazel building bar. I expected that with the ijar machinery in place bar wouldn't require a rebuild since it doesn't/cant reference foo's private parts.

Do you have an examples you can point to help demonstrate how to make this work?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279176703, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdqBgv4VOjafH9F74t_xGgNh5Ox6fks5rbia5gaJpZM4Lnafl .

softprops commented 7 years ago

ok cool I'll watch this issue and post back if I find updates. Just a final note because Im still newish to bazel and what happens when I run bazel build ...

I confirmed that I got different md5 checksums on my bazel-out/local-fastbuild/bin/foo_ijar.jar did change when I added/removed new private members and so forth. So I don't think its an issue with scala_rules in general, it does the right thing when ijars change. Maybe just a recipe for dealing with scalasignatures we need to find.

johnynek commented 7 years ago

It might be interesting for us to actually look at the contents and see the difference. Then we could validate if it is a scalasignature issue or not. On Sat, Feb 11, 2017 at 11:33 doug tangren notifications@github.com wrote:

ok cool I'll watch this issue and post back if I find updates. Just a final note because Im still newish to bazel and what happens when I run bazel build ...

I confirmed that I got different md5 checksums on my bazel-out/local-fastbuild/bin/foo_ijar.jar did change when I added/removed new private members and so forth. So I don't think its an issue with scala_rules in general, it does the right thing when ijars change. Maybe just a recipe for dealing with scalasignatures we need to find.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279177353, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdpCwDv6bBypCMigsevyq3y5lVS25ks5rbikdgaJpZM4Lnafl .

softprops commented 7 years ago

checksum of the ijar stays consistent when I modify the contents of the private methods but that still triggers some action which looks like a recompile (not positive) because it takes about 6 seconds . Still new to bazel so I'm not sure what that could be.

johnynek commented 7 years ago

Cc @ianoc

It could be something sloppy in the rules somewhere. You can set and option to show the compile time for scalac (Ian can remind us) so we can be sure if it is the compiler, or something else (maybe the jar process is somehow running no matter what?) On Sat, Feb 11, 2017 at 11:41 doug tangren notifications@github.com wrote:

checksum of the ijar stays consistent when I modify the contents of the private methods but that still triggers some action which looks like a recompile (not positive) because it takes about 6 seconds . Still new to bazel so I'm not sure what that could be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279177813, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdo4jFBLnhDdO_Clt6pS6NYHBiznvks5rbir1gaJpZM4Lnafl .

ianoc commented 7 years ago

Six seconds seems like a long time, if it's not private code is it possible for you to put a public code/ repo up somewhere as a test case ?

Generally so far changing the signature of methods be there private or public I think triggers recompiles. However changing the content of methods if we stripped the in-line information out generally seems to avoid the recompile. If we have a test case where that doesn't happen I would be very interested to look at it, since we probably hit this and just don't know it

On Sat, Feb 11, 2017 at 1:44 PM P. Oscar Boykin notifications@github.com wrote:

Cc @ianoc

It could be something sloppy in the rules somewhere. You can set and option to show the compile time for scalac (Ian can remind us) so we can be sure if it is the compiler, or something else (maybe the jar process is somehow running no matter what?) On Sat, Feb 11, 2017 at 11:41 doug tangren notifications@github.com wrote:

checksum of the ijar stays consistent when I modify the contents of the private methods but that still triggers some action which looks like a recompile (not positive) because it takes about 6 seconds . Still new to bazel so I'm not sure what that could be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279177813 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAEJdo4jFBLnhDdO_Clt6pS6NYHBiznvks5rbir1gaJpZM4Lnafl

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279177978, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvDo3tVCuXgTygezW3g-kAWrtNBYmks5rbiujgaJpZM4Lnafl .

-- Sent from Gmail Mobile

softprops commented 7 years ago

Six seconds seems like a long time, if it's not private code is it possible for you to put a public code/ repo up somewhere as a test case ?

Testing with an private company best-practices repo to show our team some examples of how to build their mixed scala/java projects with bazel. I'll try to put up a simple public example sometime this weekend.

johnynek commented 7 years ago

Okay, I did some digging.

In the ijar, the private methods don't appear (neither private nor private[this]) in the .class bytecode, but the scala signature does change relative to not having the method at all. Scalap can show this:

st-oscar1:compare oscar$ scalap -private scala.test.ScalaLibResources
package scala.test
object ScalaLibResources extends scala.AnyRef {
  def this() = { /* compiled code */ }
  private def foo: scala.Int = { /* compiled code */ }
  def getGreetings(): scala.List[scala.Predef.String] = { /* compiled code */ }
  def baz: scala.Int = { /* compiled code */ }
  def getFarewells: scala.List[scala.Predef.String] = { /* compiled code */ }
  def getData: scala.List[scala.Predef.String] = { /* compiled code */ }
  private def get(s: scala.Predef.String): scala.List[scala.Predef.String] = { /* compiled code */ }
}

st-oscar1:compare oscar$ scalap -private scala_no_private.test.ScalaLibResources
package scala.test
object ScalaLibResources extends scala.AnyRef {
  def this() = { /* compiled code */ }
  def getGreetings(): scala.List[scala.Predef.String] = { /* compiled code */ }
  def baz: scala.Int = { /* compiled code */ }
  def getFarewells: scala.List[scala.Predef.String] = { /* compiled code */ }
  def getData: scala.List[scala.Predef.String] = { /* compiled code */ }
  private def get(s: scala.Predef.String): scala.List[scala.Predef.String] = { /* compiled code */ }
}

So, scala embeds information about private methods into the scala signature.

That means to improve on matters, we need a scala-specific ijar (or probably better, extension to the bazel ijar that parses the scalasignature, removes any info about private methods, and puts it back). Here is a page that shows some demos of parsing it: http://blog.vityuk.com/2010/12/how-scala-compiler-stores-meta.html

@gkossakowski do you have any input you can share with us here? Should this be hard or easy? It seems like is should be straightforward enough.

johnynek commented 7 years ago

cc @adriaanm any suggestions on how to produce an interface jar for scala (a jar of bytecode that only has the non-private interfaces, and none of the code). Using bazel's ijar produces an interface, but the scalasig is left untouched, so the hash changes, so adding a private method causes a recompilation.

johnynek commented 7 years ago

@softprops are you using the --strategy=Scalac=worker option? I doubt it if it is taking 6 seconds on a trivial change. You can add that to your .bazelrc like:

build --strategy=Scalac=worker

johnynek commented 7 years ago

I wonder if it could be as easy just removing any item with the private flag set: https://github.com/avityuk/scala-pickled-visualizer/blob/master/src/PickledVisualizer.scala#L377

ianoc commented 7 years ago

For public it includes method length I think ? May want to do something about that also if we are doing some custom stuff. Chunking method sizes into blocks maybe ? (Though I'm not sure what the compiler can do inlining wise if it just has the ijar )

On Sat, Feb 11, 2017 at 4:01 PM P. Oscar Boykin notifications@github.com wrote:

I wonder if it could be as easy just removing any item with the private flag set:

https://github.com/avityuk/scala-pickled-visualizer/blob/master/src/PickledVisualizer.scala#L377

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279185258, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvCo98Zk0xmUksw7MKTOnSNU_8dBYks5rbkvMgaJpZM4Lnafl .

-- Sent from Gmail Mobile

johnynek commented 7 years ago

I imagine, the compiler can't inline across package boundaries if we take this approach (so you have to rely on the JIT in such cases). That may be a tradeoff worth making if the JIT usually does well enough and maybe a lot of inlining is from the std-lib or from within the package.

ianoc commented 7 years ago

Today, it can't work either can it? Right now we compile against the ijar across targets right? On Sat, Feb 11, 2017 at 4:05 PM P. Oscar Boykin notifications@github.com wrote:

I imagine, the compiler can't inline across package boundaries if we take this approach (so you have to rely on the JIT in such cases). That may be a tradeoff worth making if the JIT usually does well enough and maybe a lot of inlining is from the std-lib or from within the package.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279185465, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvNLuVji0tX_IsMMImWfwiTF7dDdOks5rbky8gaJpZM4Lnafl .

-- Sent from Gmail Mobile

johnynek commented 7 years ago

We don't ijar the standard library or compiler (or anything with macros), and of course, within a target, you are not using an ijar. But other than that, yeah, we are probably killing the ability for scalac to inline.

ianoc commented 7 years ago

So from today's experience do we think there is any losses by just stripping all of this info ? We didn't do it always before since we had the notion the inline might matter, but I can't see why now. Might as well aggressively drop.

On Sat, Feb 11, 2017 at 4:18 PM Ian O'Connell ian@ianoconnell.com wrote:

Yeah, so I guess I mean the Delta by stripping most of this information is probably nothing in losses I think ? Since the compiler can't use it to inline anyway.

On Sat, Feb 11, 2017 at 4:17 PM P. Oscar Boykin notifications@github.com wrote:

We don't ijar the standard library or compiler (or anything with macros), but other than that, yeah, we are probably killing the ability for scalac to inline.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279186083, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvC_0M87qIM2jHG1gxBLaqG37QXzCks5rbk97gaJpZM4Lnafl .

-- Sent from Gmail Mobile

johnynek commented 7 years ago

I think you can't compile scala without it. For instance, the implicit labels are probably the biggest things, but other aspects of the type-system are in there. Dropping everything, I think, would be pretty bad news.

ianoc commented 7 years ago

Yeah we need the extra type information but hopefully that's relatively static.

On Sat, Feb 11, 2017 at 4:23 PM P. Oscar Boykin notifications@github.com wrote:

I think you can't compile scala without it. For instance, the implicit labels are probably the biggest things, but other aspects of the type-system are in there. Dropping everything, I think, would be pretty bad news.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279186392, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvFoyAI9AyrAZlWz0L230fBg3vouqks5rblEYgaJpZM4Lnafl .

-- Sent from Gmail Mobile

softprops commented 7 years ago

Wasn't using any nondefault flags. I wonder if there's and knowledge that can be minded from sbts analyzing compiler. It seems a bit more knowledgeable about how to detect scala public api changes

softprops commented 7 years ago

https://github.com/sbt/zinc/tree/1.0/internal/zinc-classfile/src/main/scala/sbt/internal/inc/classfile maybe

johnynek commented 7 years ago

I wonder if sbt really does this. My understanding is that it takes another approach: automatically build a fine training dependency graph. Bazel takes the approach of humans fixing targets and then trying to minimize rebuilding whole targets.

Of course it would be great if we can learn from sbt/zinc. On Sat, Feb 11, 2017 at 15:24 doug tangren notifications@github.com wrote:

https://github.com/sbt/zinc/tree/1.0/internal/zinc-classfile/src/main/scala/sbt/internal/inc/classfile maybe

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279189073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdix_N2eMIlmJru1ww4LoltZuu4uvks5rbl9pgaJpZM4Lnafl .

ianoc commented 7 years ago

So I guess to maybe capture for myself having thought about this a bit, and see if we are on the same page, i think there are 3 major bits around this ijar stuff:

1) Does our current usage of iJar have any performance degradation. None we have noticed, but i don't think the code base we use bazel for at stripe is maybe a super performance sensitive one(we also don't that I know of have any benchmarks). We could probably do with some benchmarks that with/without JIT'd code does:

object A { @inline def foo(a: X): Bar }

perform differently if its within the same bazel target or in a different one? Right now if we are compiling against the ijar i believe we cannot make use of that inline. Though I don't know if that matters.

2) If (1) matters, we probably need to do some surgery to actually unwind part of the ijar semantics such that inline methods are included in ijars rather than all method bodies being stripped? (looking at privacy here will matter too). I don't think the scala compiler inlines otherwise unless one uses -optimise. So if this stuff matters do we make it an option to allow scala inlining, explicit or implicit across packages or do we say the rules only support JVM inlining? 3) If (1) doesn't matter, then i think we can aim to basically minify the scala signature for correctness, that is strip anything not really required for building. I think we've been slow to touch it before because of fears of something like (1) , but if its not an issue then sufficiently large code bases compiling with the rules along with whatever test cases we can make is a pretty good validation we haven't messed it up. (Given our liberal usage of types/implicits/macros i'm not too worried about us missing some gremlins here).

Thoughts?

On Sat, Feb 11, 2017 at 5:28 PM, P. Oscar Boykin notifications@github.com wrote:

I wonder if sbt really does this. My understanding is that it takes another approach: automatically build a fine training dependency graph. Bazel takes the approach of humans fixing targets and then trying to minimize rebuilding whole targets.

Of course it would be great if we can learn from sbt/zinc. On Sat, Feb 11, 2017 at 15:24 doug tangren notifications@github.com wrote:

https://github.com/sbt/zinc/tree/1.0/internal/zinc- classfile/src/main/scala/sbt/internal/inc/classfile maybe

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/ 125#issuecomment-279189073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdix_ N2eMIlmJru1ww4LoltZuu4uvks5rbl9pgaJpZM4Lnafl .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279189237, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvPEJM3YcQmhCx-orThk4wcBCl2Xwks5rbmBSgaJpZM4Lnafl .

johnynek commented 7 years ago

Yeah, Ian, I think that is a good summary.

In any case, we need to improve ijar or implement a scala specific ijar to improve the current situation (which, as you note seems fairly tolerable at Stripe at the moment).

One thing that occurs to me is that maybe this work could be done directly on the scala compiler: it could emit interface classes as it compiles. That could be useful for everyone and it would keep the code colacated with the actual signature parsing and writing code.

We could probably get this merged in mainline scala or typelevel if it was done well.

If there was a scala linker, we could defer the question about inlining until linking. On Sat, Feb 11, 2017 at 16:24 ianoc notifications@github.com wrote:

So I guess to maybe capture for myself having thought about this a bit, and see if we are on the same page, i think there are 3 major bits around this ijar stuff:

1) Does our current usage of iJar have any performance degradation. None we have noticed, but i don't think the code base we use bazel for at stripe is maybe a super performance sensitive one(we also don't that I know of have any benchmarks). We could probably do with some benchmarks that with/without JIT'd code does:

object A { @inline def foo(a: X): Bar }

perform differently if its within the same bazel target or in a different one? Right now if we are compiling against the ijar i believe we cannot make use of that inline. Though I don't know if that matters.

2) If (1) matters, we probably need to do some surgery to actually unwind part of the ijar semantics such that inline methods are included in ijars rather than all method bodies being stripped? (looking at privacy here will matter too). I don't think the scala compiler inlines otherwise unless one uses -optimise. So if this stuff matters do we make it an option to allow scala inlining, explicit or implicit across packages or do we say the rules only support JVM inlining? 3) If (1) doesn't matter, then i think we can aim to basically minify the scala signature for correctness, that is strip anything not really required for building. I think we've been slow to touch it before because of fears of something like (1) , but if its not an issue then sufficiently large code bases compiling with the rules along with whatever test cases we can make is a pretty good validation we haven't messed it up. (Given our liberal usage of types/implicits/macros i'm not too worried about us missing some gremlins here).

Thoughts?

On Sat, Feb 11, 2017 at 5:28 PM, P. Oscar Boykin <notifications@github.com

wrote:

I wonder if sbt really does this. My understanding is that it takes another approach: automatically build a fine training dependency graph. Bazel takes the approach of humans fixing targets and then trying to minimize rebuilding whole targets.

Of course it would be great if we can learn from sbt/zinc. On Sat, Feb 11, 2017 at 15:24 doug tangren notifications@github.com wrote:

https://github.com/sbt/zinc/tree/1.0/internal/zinc- classfile/src/main/scala/sbt/internal/inc/classfile maybe

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/ 125#issuecomment-279189073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdix_ N2eMIlmJru1ww4LoltZuu4uvks5rbl9pgaJpZM4Lnafl .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279189237 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAbQvPEJM3YcQmhCx-orThk4wcBCl2Xwks5rbmBSgaJpZM4Lnafl

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279191528, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdlT26R5Fjdbc6SVaSOyv0j-h52pBks5rbm1fgaJpZM4Lnafl .

softprops commented 7 years ago

Do you think that would be probable those changes would be landable in the 2.11.x line of scala? I'm targeting bazel as a viable alternative to sbt for a monorepo for the potential improved performance and correctness without changing existing scala 2.11 runtime dependencies. My goal is the prove this out by dark building in ci and drawing comparisons to our existing sbt build.

johnynek commented 7 years ago

Doug, by the way, we find bazel very usable as is. He worker mode is critical, and i do think what we are talking about here is a second order effect. Most changes, I think, are to method bodies, or public APIs.

With bazel you never need to clean, something sbt does not really offer (for instance I bet your CI does a clean build, not a cached build, but bazel can totally run a CI without cleans and you get some big performance wins from that often).

We are just discussing a small set of changes we won't need to recompile: adding/removing private methods.

I do think landing an option to 2.11 mainline is possible to have it emit interface classes, but this is purely guessing based on the kinds of changes I have seen accepted (@adriaanm may have other opinions).

Alternatively, you could imagine an option where it does not emit any scalasig info for private methods. I'm not sure why you need that, but maybe someone else can think of a reason. It seems like only the current compilation unit needs to know about private. On Sat, Feb 11, 2017 at 16:56 doug tangren notifications@github.com wrote:

Do you think that would be probable those changes would be landable in the 2.11.x line of scala? I'm targeting bazel as a viable alternative to sbt for a monorepo for the potential improved performance and correctness without changing existing scala 2.11 runtime dependencies. My goal is the prove this out by dark building in ci and drawing comparisons to our existing sbt build.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279192976, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJdtUWTZDQmK15YJ7R733nFGxLW790ks5rbnT4gaJpZM4Lnafl .

softprops commented 7 years ago

Ok I talked to my team and there wasn't really anything private in here so I just flipped the GH private repo flag. Here's the repo I'm testing with https://github.com/meetup/blt-best-bazel. There are two packages foo and bar. Bar doesn't do anything interesting. It just exists so I can test bazel's behavior when I build it after changing fiddling with non public members in the foo package.

I'm fiddling with those here

here's how the dance currently goes ( on osx )

$ bazel build //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 3.436s, Critical Path: 3.34s

# no change (its fast of course!)
$ bazel build //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 0.079s, Critical Path: 0.00s

# fiddle with private members of foo package
$ bazel build //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 6.038s, Critical Path: 5.96s

I reread some of the comments above last night and an getting a better sense of the scala-specifc problem. The public interfaces indeed doesn't change so ijar is doing what it advertises. By scala encodes enriched type interface ( including private interfaces ) metadata as bytearray in a @ScalaSig annotation attached to the public interfaces which invalidates the cache.

Some options on the table are to some how edit that bytearray, maybe with PickBuffers write methods and deserialize those definitions and store that in the ijar. This would be very cool. ScalaSig tampering. If there were a lib for that I'd call it forgery!

The other option is to get scalac to do all the work up front and not use ijar and use a scalaspecific form that uses public interfaces scalac may emit for us.

Let me know if my understanding is on the right path. This is a very interesting problem but one I think more scala shops will appreciate a good solution for when adopting bazel

johnynek commented 7 years ago

Yes. This seems like a good summary.

One pattern we use is to define a repo-local set of the scala rules (that call the standard one) where we can change some default options (like compiler plugins and compiler flags): in local_tools/scala/scala.bzl

load("@io_bazel_rules_scala//scala:scala.bzl",
     uppstream_lib = "scala_library",
     uppstream_macro = "scala_macro_library",
     uppstream_bin = "scala_binary",
     uppstream_test = "scala_test")

_default_scalac_opts = [
    "-Yskip-inline-info-attribute",  # without this minor changes almost always trigger recomputations
    "-Ywarn-dead-code",
    "-Ywarn-unused-import",  # macros can cause this to kill the compiler, see: https://github.com/scala/pickling/issues/370
    "-Ywarn-value-discard",
    "-Xmax-classfile-name", "128",  # Linux laptops don't like long file names
    "-Xlint",
    "-Xfuture",
    "-Xfatal-warnings",  # sometimes disabled due to https://issues.scala-lang.org/browse/SI-9673 on stubs
    "-deprecation",
    "-feature",
    "-unchecked",
    "-Yno-stub-warning",  # This is a custom flag we added (see WORKSPACE)
]

# We use the linter: https://github.com/HairyFotr/linter
# This is disabled by default since it has a large performance overhead.
_plugins = []  # "@org_psywerx_hairyfotr_linter_2_11//jar:file"]

def scala_library(name, srcs = [], deps = [], runtime_deps = [], data = [], resources = [],
                  scalacopts = _default_scalac_opts, jvm_flags = [], main_class = "", exports = [], visibility = None):
    uppstream_lib(name = name, srcs = srcs, deps = deps, runtime_deps = runtime_deps,
                  plugins = _plugins,
                  resources = resources, scalacopts = scalacopts,
                  jvm_flags = jvm_flags, main_class = main_class, exports = exports, visibility = visibility,
                  print_compile_time=True)

etc....

This way, we can put skip inline attribute in one place.

Also, I can't stress enough what a big performance difference the worker strategy makes.

softprops commented 7 years ago

Oh nice!

johnynek commented 7 years ago

I wonder if this is as easy as right here: https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/symtab/classfile/Pickler.scala#L45

adding a check that the tree.symbol.hasFlag(PRIVATE) and if it does, skip it. I might try that tomorrow and see if that can solve the issue.

ianoc commented 7 years ago

BTW -- I checked out that repo...

./bazel build bar/...

:

uncomment // private val privateField: Int = 1 in foo/src/main/scala/com/meetup/blt/foo/BestScala.scala

INFO: Reading 'startup' options from
/var/folders/pd/2j4md8hj5_x01529sxvr8_gr0000gn/T//bazel_bazel_rc_1486931405:
--watchfs
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 0.915s, Critical Path: 0.84s

I basically can't get a recompile time above 1 second

This is using the worker strategy, and watchfs on my osx laptop.

With bazel clean/expunge and such building the whole repo looks like:

INFO: Reading 'startup' options from /var/folders/pd/2j4md8hj5_x01529sxvr8_gr0000gn/T//bazel_bazel_rc_1486931601: --watchfs . INFO: Found 3 targets... INFO: From Building external/io_bazel_rules_scala/src/java/io/bazel/rulesscala/scalac/scalac.jar (2 source files) [for host]: Note: external/io_bazel_rules_scala/src/java/io/bazel/rulesscala/scalac/CompileOptions.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. INFO: Elapsed time: 16.320s, Critical Path: 9.21s

On Sun, Feb 12, 2017 at 12:25 PM, P. Oscar Boykin notifications@github.com wrote:

I wonder if this is as easy as right here: https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/ symtab/classfile/Pickler.scala#L45

adding a check that the tree.symbol.hasFlag(PRIVATE) and if it does, skip it. I might try that tomorrow and see if that can solve the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_scala/issues/125#issuecomment-279245774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQvLnakx2cLYPeIoxoKsG0qMXYrTOzks5rb2qfgaJpZM4Lnafl .

softprops commented 7 years ago

I might try that tomorrow and see if that can solve the issue.

Awesome. You folks are seriously the best.

softprops commented 7 years ago

Last note for the weekend. --strategy=Scalac=worker indeed makes a huge difference! Thanks @johnynek

$ bazel build  --strategy=Scalac=worker //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
INFO: From scala //foo:foo:
Compiler runtime: 250ms.
INFO: From scala //bar:bar:
Compiler runtime: 162ms.
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 1.215s, Critical Path: 1.11s

gkk-stripe commented 7 years ago

Hi guys, Chiming in on your discussions of ScalaSig and private. Sometimes you need a private member to be part of your interface. E.g. private var in a trait has to be kept in the interface, see for details: https://github.com/sbt/sbt/issues/2155 (the discussion is about sbt but exactly the same issue would pop up if bazel's ijar excluded private var defined in a trait from the interface) I think, except for private var in a trait, you're safe to skip all private members in interface.

Scalac produces a number of synthetic members (e.g. accessors for nested classes, holders for default arguments) that are public but should not be included in the interface.

The issues you're wresting are the same sbt/zinc needed to solve. I encourage for you to borrow ideas and code from: https://github.com/sbt/zinc/blob/1.0/internal/compiler-bridge/src/main/scala/xsbt/ExtractAPI.scala

Zinc solves two problems for incremental compilation:

Dependency graph extraction
Interface extraction

Bazel has a different approach to 1. but 2. is shared between bazel and zinc as problem to tackle. Trying to solve 2. from scratch by surgery to ScalaSig is doomed to be an epic amount of work.

Lastly, there have discussion about integrating ExtractAPI back to scalac because that code is very tied to internals of scalac and we had a feeling that other tools than just zinc could make use of the info produced by ExtractAPI. This discussion confirms that impression. I think it's realistic to integrate ExctractAPI into scalac in 2.12.x as an experimental feature and have it stable in 2.13. Scala 2.11 is long in maintenance mode so nobody from Scala team wants to see any new development there.

gkk-stripe commented 7 years ago

As a bonus, I'm throwing a fun issue that you would have to solve yourself if you go down of manually tweaking scalaSig: https://github.com/sbt/sbt/issues/823 There're more such subtle issues that I can't recall off the top of head.

johnynek commented 7 years ago

Thanks for replying in detail @gkk-stripe.

I saw this comment: https://github.com/sbt/sbt/pull/2441#issuecomment-181896698

Is that still your understanding? I wonder if the conservative approach might be to just not write private defs into the scalasig in the first place. That seems like it will cover most of the cases that matter.

gkk-stripe commented 7 years ago

Scalasig is used for recreating faithfully the state of a symbol across compiler runs (so it supports separate compilation). Consider an example like this:

// A.scala
class A

// B.scala
class B extends A

Scalasig pickles all information about A so you get the same results from either running:

scalac A.scala B.scala

or

scalac A.scala && scalac B.scala

Skipping even private defs would break that property. Scalasig serves many clients that would break in subtle ways:

scala reflection: you wouldn't see private members of a class/trait
implicit search gets affected because private implicit defs are still considered to be a member but only later get excluded for not being accessible. In other words: membership and accessibility are orthogonal concerns in Scala
Scala inliner used to rely on looking up private members for inlining (I'm not sure if it still does it in Scala 2.12, though)

In all these cases you would get different results from either a clean build or an incremental build. This would break the main selling point of bazel that you don't need ever to clean manually.

johnynek commented 7 years ago

So, @gkk-stripe and I spoke offline.

The TL;DR is:

currently, bazel is correct, and fairly fast, but invalidates the cache when there is ANY scala signature change (adding/removing any method, changing any type, etc...).
as Grzegorz says, private in scala is pretty subtle (for instance, in 2.11 at least, he says a private can shadow a non-private, making a name inaccessible, so removing a private does not give the same compilation).
a way forward would be to see about adding a scalac option to emit interface files. The contract of the interface file is that should be the minimal file against which we can compile and get the same output as compiling against a class/jar representation of the same dependency. The interface should be suitable for hashing as a key into a cache (i.e. it should be deterministic: if two interfaces are the same, they should be bit-for-bit identical). It would be nice for these interface files to be independent of platform (i.e. in principle work for scaja-js, scala-native).

We are thinking of a SIP proposal for item 2 above. We will update this issue with links to any discussions.

softprops commented 7 years ago

Sounds grrrreat

SethTisue commented 7 years ago

a way forward would be to see about adding a scalac option to emit interface files.

see also http://docs.scala-lang.org/sips/pending/binary-compatibility.html. this was just discussed at the SIP meeting today and seems likely to move forward in some form. cc @darkdimius @jvican

bazelbuild / rules_scala

document cases where ijar + scalac is suboptimal #125