Open smocherla-brex opened 3 weeks ago
@smocherla-brex Bazel is able to persist your workers between builds negating the performance overhead of having to spin them up for each build or even each action. Are you by chance killing your workers after each build?
I guess multiplex workers would help with this problem (If I remember it was enabled but reverted due to some corruption issues?).
rules_kotlin
still supports multiplex workers and you can enable them for your project to see if they work. They won't solve the slow startup times being reported here since Bazel is still having to spin up a worker, but it may reduce them some since Bazel isn't having to spin up as many workers.
Multiplex workers are gated by this flag: https://github.com/bazelbuild/rules_kotlin/blob/7dcb7f94f3f367110d75a3ea4464ae4e4cbbf8f0/kotlin/internal/toolchains.bzl#L215-L218
I was also wondering if using GraalVM native image binaries for these workers instead of raw Java binaries would help. Recently Bazel/rules_java has adopted it https://github.com/bazelbuild/rules_java/pull/151 with some benefits. We're looking to try with GraalVM binaries for the compiler workers but was wondering if that's a direction you'd be open to accepting (if it actually brings the benefits with cold start).
I haven't tried building rules_kotlin
with GraalVM
yet but we did test it out internally against some other internal rules. GraalVM
isn't a free drop in replacement for Java and from what we found it does take some work to get things compiling with it. Getting the entire Kotlin compiler along with KSP and Kapt seems like a pretty challenging task.
There are also some other features that we'd like to test out like incremental Kotlin compilation (similar to Gradle) which is something we can only realistically do if we have a persistent worker that can define the shared disk cache for the Kotlin compiler.
We'd definitely be open to evaluating a GraalVM
solution if you are able to get things compiling and executing.
Bazel is able to persist your workers between builds negating the performance overhead of having to spin them up for each build or even each action. Are you by chance killing your workers after each build?
We actually do have to enable this sadly in CI (with --worker_quit_after_build
and --worker_max_instances
) because of https://github.com/bazelbuild/bazel/issues/12165 and we've observed that we frequently run into OOMs because the workers use an enormous amount of memory. This has less of an affect in local development as we don't have --worker_quit_after_build
there but the memory usage problem persists. We haven't yet upgraded to Bazel 7 which has some of the flags for worker GC management but if we do enable it, I would expect the cold start times to be more problematic there.
rules_kotlin still supports multiplex workers and you can enable them for your project to see if they work. They won't solve the slow startup times being reported here since Bazel is still having to spin up a worker, but it may reduce them some since Bazel isn't having to spin up as many workers.
Thanks for the pointer - somehow missed it, I will try this out and check.
I haven't tried building rules_kotlin with GraalVM yet but we did test it out internally against some other internal rules. GraalVM isn't a free drop in replacement for Java and from what we found it does take some work to get things compiling with it. Getting the entire Kotlin compiler along with KSP and Kapt seems like a pretty challenging task.
There are also some other features that we'd like to test out like incremental Kotlin compilation (similar to Gradle) which is something we can only realistically do if we have a persistent worker that can define the shared disk cache for the Kotlin compiler.
I was looking at the Kotlin builder code and see how we could plug in incremental compilation based on this, but it seemed a bigger effort managing a cache that Bazel is unaware of (mostly concerned about non-reproducibility issues we could run into, but I'm also not a kotlin expert :)). Good to know it's on the radar though.
We'd definitely be open to evaluating a GraalVM solution if you are able to get things compiling and executing.
Good to know, and also thanks for sharing your experience with rules_graalvm. I'll try out some experiments on our end and will follow-up with an update if I can get it compiling.
Even for local builds I do notice this without --worker_quit_after_build
and after adding --worker_verbose
, seems like Bazel's garbage collection of workers does result in workers being cleaned up during a build with many actions without us doing it ourselves.
INFO: Destroying KotlinCompile worker (id 11)
INFO: Destroying KotlinCompile worker (id 1)
INFO: Destroying KotlinKapt worker (id 6)
kotlinc has been an interesting performer with remote execution -- as a persistent worker it's slightly slower than javac.
But, well, the first compilation is slooooooow until the first gc pass. I suspect it leans very heavily on the JIT. We've found that too much ram can be as problematic as too little, and it's happiest with 2-3 cpus. I haven't had as much time as I'd like to try and tune it, of course. I suggest tuning the gc/allocations for the worker until you see improvement.
K2 will, of course, change the whole game.
I observed that cold start on the persistent workers can play a part in slow builds. An example with Kapt (using
--define kt_timings=1
)and with a warm persistent worker
Around half the time. Also an example with
KotlinCompile
cold start
and warm start
I guess multiplex workers would help with this problem (If I remember it was enabled but reverted due to some corruption issues?). I was also wondering if using GraalVM native image binaries for these workers instead of raw Java binaries would help. Recently Bazel/rules_java has adopted it https://github.com/bazelbuild/rules_java/pull/151 with some benefits. We're looking to try with GraalVM binaries for the compiler workers but was wondering if that's a direction you'd be open to accepting (if it actually brings the benefits with cold start).