bazeltools / bazel-deps

Generate bazel dependencies for maven artifacts
MIT License
249 stars 121 forks source link

Duplicate targets when mixing Scala and Java dependencies #299

Open salvalcantara opened 3 years ago

salvalcantara commented 3 years ago

There seems to be a bug caused by mixing Scala and Java dependencies. The tool seems to use the same target name for both, and thus produces duplicate targets (one for Scala and one for Java). In particular, I'm hitting this problem in a very simple Flink/Scala project where the dependencies look like this:

dependencies:
  org.apache.flink:
    flink:
      lang: scala
      version: "1.11.2"
      modules: [clients, scala, streaming-scala]
    flink-connector-kafka:
      lang: java
      version: "0.10.2"
    flink-test-utils:
      lang: java
      version: "0.10.2"

FYI, the full code is available here: https://github.com/salvalcantara/bazel-flink-scala.

When I try to build the project, I'm getting this error:

Starting local Bazel server and connecting to it...
ERROR: Traceback (most recent call last):
        File "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line
44, column 25, in <toplevel>
                build_external_workspace(name = "vendor")
        File
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 258, column 91, in build_external_workspace
                return build_external_workspace_from_opts(name = name, target_configs =
list_target_data(), separator = list_target_data_separator(), build_header =
build_header())
        File
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 251, column 40, in list_target_data
                "vendor/org/apache/flink:flink_clients":
["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients","visibility||||||//visibility:public","kind||||||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"],
Error: dictionary expression has duplicate key:
"vendor/org/apache/flink:flink_clients"
ERROR: error loading package 'external': Package 'external' contains errors
INFO: Elapsed time: 3.644s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)

If you look at the dict lines that are reported as conflicting, you'll see the duplicate "vendor/org/apache/flink:flink_clients" target:

 "vendor/org/apache/flink:flink_clients": ["lang||||||java","name||||||//vendor/org/apache/flink:flink_clients", ...],
 "vendor/org/apache/flink:flink_clients": ["lang||||||scala:2.12.11","name||||||//vendor/org/apache/flink:flink_clients", ...],

Maybe it's possible to customize the target templates to suffix targets with the language? Something like:

_JAVA_LIBRARY_TEMPLATE = """
java_library(
  name = "{name}_java",
..."""

_SCALA_IMPORT_TEMPLATE = """
scala_import(
    name = "{name}_scala",
..."""

In any case, I would really appreciate having this issue addressed, might it be a bug or not.

johnynek commented 3 years ago

huh...

I don't think we've considered this case. Due to the _version approach of scala jar publishing there could be two different jars which collide if you chop off the version information from the name.

I think a good fix would be to:

  1. make an explicit check for this in the code and give a good error message before writing the graph out.
  2. require the user to somehow disambiguate in their yaml config file

If we automatically disambiguate, we would change the output if we happen to add a dependency that causes a collision, but that could happen late in the life of a big repo, which would be a pain.

to rename, we could something like:

  1. add field to the record: "alias" which allows you to rename a single item (if there are no modules).
  2. if you do use modules then the modules can either be strings, or a map with one item modules: [{"client": "client_java"}] or something like that...

I can coach you on what files need to change to get this done, but I probably can't get to this very soon personally. cc @ianoc who may or may not have any other thoughts on this.

salvalcantara commented 3 years ago

Sounds good @johnynek ... I might be able to help if you point me on the right direction (don't promise anything, though).

johnynek commented 3 years ago

So, the key things would be:

  1. add support for representing the data in the DepsModel file: https://github.com/johnynek/bazel-deps/blob/3bed1b71bf94a62b566d6d98f119f2708b28c367/src/scala/com/github/johnynek/bazel_deps/DepsModel.scala#L601
  2. check for duplicates after language normalization I think in this method: https://github.com/johnynek/bazel-deps/blob/3bed1b71bf94a62b566d6d98f119f2708b28c367/src/scala/com/github/johnynek/bazel_deps/Writer.scala#L117

I hope that's a good starting point.

johnynek commented 3 years ago

BTW: you might be able to make two PRs: the second one: error when there is a duplicate with a nice error, might be the easier of the two and would have helped you figure out (though not remedy) the issue.