bazelbuild / rules_scala

Scala rules for Bazel
Apache License 2.0
363 stars 278 forks source link

Protobuf compatibility findings and suggestions #1647

Open mbland opened 1 week ago

mbland commented 1 week ago

As a side quest to #1482, I've been experimenting with trying to update the Protobuf library beyond version 21.7, released 2022-09-29. The reason is that I anticipate potential complications after the Bzlmod update lands if users have later Protobuf versions in their MODULE.bazel dependency graph.

I've discovered the following details, which we may need to communicate to users through release notes and/or other documentation. Here are the combinations of various dependencies that work with one another. The first row is the current combination.

Protobuf abseil-cpp Bazel 6 ScalaPB Scala
v21.7 20220623.1 0.9.0 All
v21.7 20220623.1 0.9.8 All
v21.7 20220623.1 0.11.17 >= 2.12
v25.5 20240722.0 with --cxxopt 0.9.8 All
v25.5 20240722.0 with --cxxopt 0.11.17 >= 2.12
>= v28.2 20240722.0 with --cxxopt 1.0.0-alpha.1 >= 2.12

Suggestions

I'd suggest updating the .bazelversion to 7.4.0 and bumping Protobuf to v25.5. My reasoning is that these changes would bring rules_scala closer to recent Bazel and Protobuf updates, without potentially breaking existing users.

If we decide to build with Bazel 7 by default instead of Bazel 6, the Protobuf recompilation sensitivity won't be as much of an issue. In my experience, after merging #1619, #1620, and #1622, I haven't found any other Bazel 6 and 7 incompatibilities outside of building newer Protobuf versions.

It would be nice to bump to ScalaPB 1.0.0-alpha.1 to use the latest Protobuf v28.3. However, it seems like that might be too big a bump for existing users right now.

Takeaways

ScalaPB and the scala_proto_aspect are sensitive to the Protobuf version

The scala_proto_library rule uses an aspect that delegates to the protoc-bridge generator framework to generate code. protoc-bridge itself isn't very sensitive, but the ScalaPB.ScalaPbCodeGenerator implementations to which it delegates are very sensitive to the Protobuf library version.

See the protoc-bridge explanation section below for details on why this is the case.

The Protobuf Bazel module remaining at compatibility_level 1 is a problem

All current versions of the Protobuf Bazel module, from before 21.7 and up to the current 29.0-rc2.bcr.1, have a compatibility_level of 1. This means that all modules in the Bzlmod/MODULE.bazel dependency graph will resolve to a single Protobuf version, regardless of its major version number.

This means that rules_scala locks scala_proto users to the maximum Protobuf version that it supports (or to a minimum of v28.2 with ScalaPB 1.0.0-alpha.1). While already true when using WORKSPACE, this will also be the case under Bzlmod, until a Protobuf module release with a different compatibility_level.

Protobuf >= v22 requires C++14 or greater, --cxxopt flags in Bazel 6

Protobuf v22 dropped C++11 support, requiring C++14 or greater. This corresponds to the update in abseil-cpp from 20220623.2, the last to support C++11, to 20230125.0, the first to require C++14.

This is not a problem for Bazel 7. Bazel 6, however, does not use a C++14 or greater toolchain out of the box.

To build these more recent versions of abseil-cpp and Protobuf under Bazel 6, I've added these .bazelrc flags from bazelbuild/bazel#20785 in commit 03d4e20 from mbland/rules_scala:

build:linux --cxxopt=-std=c++17
build:linux --host_cxxopt=-std=c++17
build:macos --cxxopt=-std=c++17
build:macos --host_cxxopt=-std=c++17
build:windows --cxxopt=/std=c++17
build:windows --host_cxxopt=/std=c++17

--cxxopt flags in Bazel 6 cause Protobuf to rebuild more frequently

Protobuf is already notoriously sensitive to recompilation.

This recompilation became so bad, I thought test_scalafmt from test_cross_build.sh was hanging. It was actually recompiling Protobuf on every bazel run invocation. (I've since filed #1646 to resolve this.)

Protobuf v25.5 is the highest version compatible with ScalaPB 0.9.8, 0.11.17

I was able to confirm this by iterating through different configurations of the test_scala_version test case from test_version.sh. I used the RULES_SCALA_TEST_ONLY environment variable from #1646 and the test_version.sh changes from commit ff1a079 from mbland/rules_scala to do this fairly easily.

There is no combination that works with Protobuf v26 or v27

I was able to confirm this using the same methodolgy...

No Protobuf version < v28.2 works with ScalaPB 1.0.0-alpha.1

...as I was able to confirm this as well.

Working branches

I was able to reach the above conclusions by building up changes in the following branches:

protoc-bridge explanation

scala_proto_library uses scala_proto_aspect, which uses the scripts.ScalaPBWorker class from src/scala/scripts/ScalaPBWorker.scala, which calls ProtocBridge.runWithGenerators. This function ultimately runs generator classes in separate threads via Futures. It then runs protoc with --plugin flags pointing at shell wrappers communicating with these generator Futures over a local pipe.

The ScalaPB jars required by the generators (not so much the protoc-bridge framework itself) must be compatible with the current Protobuf library version. Not just with the protobuf-java Maven artifact, which can be a new as you like, but with the build's actual Protobuf repository that provides protoc. Otherwise, ScalaPB code will throw exceptions like java.lang.NoSuchMethodError or java.lang.IllegalAccessError.

For example, from the #1624 commit message:

Exception in thread "main" java.lang.NoSuchMethodError:
  'boolean com.google.protobuf.GeneratedMessageV3.isStringEmpty(java.lang.Object)'

From the #1630 commit message:

--jvm_extra_protobuf_generator_out: java.lang.NoSuchMethodError:
  'java.lang.Object com.google.protobuf.DescriptorProtos$MessageOptions.getExtension(com.google.protobuf.GeneratedMessage$GeneratedExtension)'
    at scalapb.compiler.DescriptorImplicits$ExtendedMessageDescriptor.messageOptions(DescriptorImplicits.scala:532)

From the #1637 commit message:

--jvm_extra_protobuf_generator_out: java.lang.IllegalAccessError:
  class scalapb.options.Scalapb$ScalaPbOptions tried to access method
  'com.google.protobuf.LazyStringArrayList
    com.google.protobuf.LazyStringArrayList.emptyList()'
  (scalapb.options.Scalapb$ScalaPbOptions and
   com.google.protobuf.LazyStringArrayList are in unnamed module
   of loader 'app')

From commit 6f702a6 from mbland/rules_scala, which I plan to submit when bumping to ScalaPB 0.11.17:

--scala_out: java.lang.NoSuchMethodError:
  'void com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom(java.lang.String[],
    com.google.protobuf.Descriptors$FileDescriptor[],
    com.google.protobuf.Descriptors$FileDescriptor$InternalDescriptorAssigner)'
      at scalapb.options.compiler.Scalapb.<clinit>(Scalapb.java:10592)

The changes from #1637 and the last commit above catch ScalaPB generator exceptions and return the stack trace as a proper error response. This is how I collected the stack traces above. Without these changes, the build hangs on uncaught exceptions, since the scripts.ScalaPBWorker will wait forever for a dead generator to respond.

Here's an example of a scripts.ScalaPBWorker command generated when building scala_proto_library targets in //test/proto/..., with long path prefixes and many jar paths elided:

.../scalapb_worker.runfiles/io_bazel_rules_scala/../remotejdk11_.../bin/java
  -classpath
    .../io_bazel_rules_scala/src/scala/scripts/scalapb_worker.jar:
    .../io_bazel_rules_scala/src/scala/scripts/scalapb_worker_lib.jar:
    [ ...protobuf, core Scala, and ScalaPB jars... ]
  -DGEN_jvm_extra_protobuf_generator=
   scalarules.test.extra_protobuf_generator.ExtraProtobufGenerator
  -DGEN_scala=scripts.ScalaPbCodeGenerator
  -DPROTOC=bazel-out/.../bin/external/com_google_protobuf/protoc
  -DJARS=
    .../bin/test/src/main/scala/scalarules/test/extra_protobuf_generator/extra_protobuf_generator.jar:
    [ ...protobuf, core Scala, and ScalaPB jars... ]
  scripts.ScalaPBWorker --persistent_worker

The scripts.ScalaPBWorker class calling ProtocBridge.runWithGenerators launches protoc processes like this, with $VARDIR representing the generated /var path:

bazel-out/.../bin/external/com_google_protobuf/protoc
  --plugin=protoc-gen-scala=$VARDIR/protocbridge11068409970088580527
  --plugin=protoc-gen-jvm_extra_protobuf_generator=
    $VARDIR/protocbridge7908074065015031242
  --jvm_extra_protobuf_generator_out
    bazel-out/.../bin/test/proto/
    test2_jvm_extra_protobuf_generator_scalapb.srcjar
  --jvm_extra_protobuf_generator_opt grpc,single_line_to_proto_string
  --scala_out bazel-out/.../bin/test/proto/test2_scala_scalapb.srcjar
  --scala_opt grpc,single_line_to_proto_string
  --descriptor_set_in
    bazel-out/.../bin/test/proto2/test-descriptor-set.proto.bin:
    bazel-out/.../bin/test/proto/test2-descriptor-set.proto.bin
  test/proto/test2.proto

Notice the --plugin= flags above, which specify scripts generated by protoc-bridge that look like this:

#!/bin/sh
set -e
cat /dev/stdin > "$VARDIR/protopipe-14656650241271353812/input"
cat "$VARDIR/protopipe-14656650241271353812/output"

This command will show the scripts.ScalaPBWorker parent process, all the plugin script child processes, and the cat "$VARDIR/protopipe-.../output" grandchild processes if they're still running:

ps -ef | grep -E '[p]roto(cbridge|pipe)'

Without the aforementioned changes, you have to kill the build with CTRL-C when it hangs. bazel shutdown will then clean up the processes that haven't yet become zombified, but such zombified processes must be cleaned up with:

ps -ef | grep -E '[p]roto(cbridge|pipe)' | awk '{print $2}' | xargs kill
mbland commented 4 days ago

I just ran a small experiment to confirm my understanding of how potential Bzlmod users could accommodate our Protobuf dependency and theirs, whereby they rely on a version that isn't currently compatible with rules_scala. The mechanism relies on:

I created a new repo consisting of only the following three files:

.bazelversion (for Bazelisk):

7.4.1

MODULE.bazel:

module(name = "multiple-module-repo-experiment")

bazel_dep(
    name = "protobuf",
    version = "21.7",
    repo_name = "com_google_protobuf",
)

bazel_dep(
    name = "protobuf",
    version = "28.3",
)

multiple_version_override(
    module_name = "protobuf",
    versions = ["21.7", "28.3"],
)

BUILD:

load("@com_google_protobuf//:protobuf.bzl", "cc_proto_library")
load("@protobuf//bazel:proto_library.bzl", "proto_library")

This builds successfully and produces the following repos:

$ ls -ld "$(bazel info output_base)"/external/protobuf*
drwxr-xr-x  54 user  group  1728 Nov 19 14:18 .../external/protobuf~v21.7/
drwxr-xr-x  66 user  group  2112 Nov 19 14:18 .../external/protobuf~v28.3/

So we'd need to put in the release notes or documentation something like the following:


I also confirmed that multiple_version_override would be required even if the compatibility_level of the Protobuf module versions were different. I extended my experiment to try it with rules_erlang. I added the following to MODULE.bazel:

bazel_dep(
    name = "rules_erlang",
    version = "2.5.2",
    repo_name = "rules_erlang_v2",
)

bazel_dep(
    name = "rules_erlang",
    version = "3.16.0",
)

and the following to BUILD:

load("@rules_erlang_v2//:erlang_app.bzl", "test_erlang_app")
load("@rules_erlang//:erlang_app.bzl", "erlang_app")

Without multiple_version_override, the build failed with:

$ bazel build //:all                                                                          

ERROR: Error computing the main repository mapping:
  <root> depends on rules_erlang@3.16.0 with compatibility level 3,
  but <root> depends on rules_erlang@2.5.2 with compatibility level 2
  which is different                           

Then I added the following to MODULE.bazel, and it worked:

multiple_version_override(
    module_name = "rules_erlang",
    versions = ["2.5.2", "3.16.0"],
)

And we can see the different repo versions in this case, too:

$ ls -ld "$(bazel info output_base)"/external/rules_erlang*

drwxr-xr-x  34 user  group  1088 Nov 19 16:07 .../external/rules_erlang~v2.5.2/
drwxr-xr-x  60 user  group  1920 Nov 19 16:07 .../external/rules_erlang~v3.16.0/

At least in the rules_erlang case, where the compatibility_levels were different, we got an error message up front pointing at the problem. Without it, we potentially have to decipher a cryptic build breakage and diagnose the root cause, as I demonstrated in the original issue description.

mbland commented 1 day ago

Just filed scalapb/ScalaPB#1771 to see about applying fixes upstream to avoid scalapb.ScalaPbCodeGenerator hangs.