bazelbuild / rules_closure

Closure rules for Bazel
https://developers.google.com/closure
Apache License 2.0
153 stars 112 forks source link

Incompatibility between maven_jar and java_import_external rules generated directory structures #200

Closed davido closed 7 years ago

davido commented 7 years ago

For non trivial project some of the rules_cosure project dependencies can already be consumed with native or custom maven_jar rule.

However, this leads to the incompatibility in how the directory structures are generated:

# generated by java_import_external()
$ cd external/com_google_dagger && ls -latr
total 72
-rw-r--r--   1 davido users   122 May  1 22:13 WORKSPACE
-rw-r--r--   1 davido users 36939 May  1 22:13 dagger-2.9.jar
-rwxr-xr-x   1 davido users   431 May  1 22:13 BUILD
drwxr-xr-x   2 davido users  4096 May  1 22:18 .
drwxr-xr-x 231 davido users 20480 May  1 22:21 ..

compare to directory structure generated by maven_jar rule:

# generated by maven_jar()
$ cd external/args4j && ls -lRatr

-rw-r--r--   1 davido users   100 May  1 22:21 WORKSPACE
drwxr-xr-x   2 davido users  4096 May  1 22:21 jar

./jar:
total 144
-rw-r--r-- 4 davido users 74703 Dec 13 22:31 args4j-2.0.26.jar
-rw-r--r-- 4 davido users 53487 Dec 13 22:31 args4j-2.0.26-src.jar
-rw-r--r-- 1 davido users   421 May  1 22:21 BUILD

As the consequence, the dependencies cannot be found, as there is no BUILD file generated by the maven_jar rule in the package root location, e.g.:

$ bazel build gerrit
ERROR: /home/davido/.cache/bazel/_bazel_davido/27a001f4182820ef315d8d2d4f1edafe/external/com_google_closure_stylesheets/BUILD:7:1: no such package '@args4j//': BUILD file not found on package path and referenced by '@com_google_closure_stylesheets//:com_google_closure_stylesheets'.
ERROR: Analysis of target '//:gerrit' failed; build aborted.
INFO: Elapsed time: 2.781s

For example I need to apply this diff: [1] to convince bazel 0.4.5 to build gerrit code review project against rules_closure from master.

davido commented 7 years ago

This incompatibility was introduced in a47cc063d39b3958f31b4c5a992741020fd9ee3e. Workaround is to consume the external dependencies with rules_closure specific java_import_external rule, in projects that depend on rules_closure.

This workarond seems to be quite intrusive, as it forces projects that depend on rules_closure to rewrite some parts of the build tool chain from "@foo//jar" to "@foo". Moreover, for non trivial projects, like Gerrit Code Review, that consume hundreds of external dependencies, and that has hundreds of own plugins that consumes the Gerrit plugin API, and thus transitively depends on Gerrit dependencies, it could be very challenging, when some of the hundreds of external dependencies (like rules_closure in this specific case) would force Gerrit Code Review project itself to replace substantial parts of the build tool chain, by replacing the consumption of maven_jar with java_import_external.

Moreover, in case of custom maven_jar rule implementations, some feature are lost, like aggresive caching introduced by Gerrit Code Review maven_jar bazlet: [1], that is a backport from Buck's maven_jar bucklet implementation: [2].

IOW, it would be great, if a47cc063d39b3958f31b4c5a992741020fd9ee3e could be rewritten, so that it remains compatible to maven_jar directory layout.

jart commented 7 years ago

java_import_external isn't rules_closure specific. The only thing it does is it generates a BUILD file that has a java_import rule. That rule has the same name as the repository. @foo is the shortened version of @foo//:foo. Except it has the deps attribute set, so you don't need a third party directory where java_library rules express the dependency relationships using exports and runtime_deps.

It's not possible for us to rewrite rules_closure, Nomulus, TensorFlow, etc. to have Bazel invoke the Maven command from within Bazel. That would require everyone to install Maven before building. It would most likely be quadratic across repositories. It would not be able to take advantage of the Bazel downloader which is necessary in order to meet the reliability requirements of projects like TensorFlow.

davido commented 7 years ago

It's not possible for us to rewrite rules_closure, Nomulus, TensorFlow, etc. to have Bazel invoke the Maven command from within Bazel.

I'm not asking to re-write java_import_external using Maven. What I'm saying is that it could generate BUILD file in jar directory, like maven_jar is doing. So that @foo//:foo would become @foo//jar. In fact it was very recently changed in rules_closure from maven_jar to java_import_external in https://github.com/bazelbuild/rules_closure/commit/a47cc063d39b3958f31b4c5a992741020fd9ee3e , to support multiple URLs. Before this commit the native maven_jar rule was used and it was compatible with gerrit way to consume external dependencies.

davido commented 7 years ago

BTW, the README.md is stale, and still claiming that it could be overridden with:

load("@io_bazel_rules_closure//closure:defs.bzl", "closure_repositories")
closure_repositories(omit_guava=True)

maven_jar(
    name = "guava",
    artifact = "...",
    sha1 = "...",
)

For one, guava was replaced with canonical artifact name, for another, and this is what this issue is all about, it wouldn't be compatibe with native or custom maven_jar rule. Liek the code orgnized now, it cannot be overridden with maven_jar, but only with rules_closure own java_import_external.

Real world rules_closure integration example: [1].

load("@io_bazel_rules_closure//closure/private:java_import_external.bzl",
     "java_import_external")

#maven_jar(
#    name = "javax_inject",
#    artifact = "javax.inject:javax.inject:1",
#    sha1 = "6975da39a7040257bd51d21a231b76c915872d38",
#)

closure_repositories(
    omit_javax_inject=True
)

java_import_external(
    name = "javax_inject",
    jar_urls = [
        "http://bazel-mirror.storage.googleapis.com/repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",
        "http://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",
        "http://maven.ibiblio.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",
    ],
    jar_sha256 = "91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",
    licenses = ["notice"],  # Apache 2.0
)

I would expect, that I would be able to say just that:

maven_jar(
    name = "javax_inject",
    artifact = "javax.inject:javax.inject:1",
    sha1 = "6975da39a7040257bd51d21a231b76c915872d38",
)

closure_repositories(
    omit_javax_inject=True
)

and this would just work.

[1] https://gerrit-review.googlesource.com/#/c/105310/4/WORKSPACE@14

jart commented 7 years ago

I like your thinking on the backwards compatibility. I can modify java_import_external to generate a jar/BUILD file with an alias() rule pointing to the new target. I'll do that right now. However, please be advised that the alias target is not going to have the same behavior, because it will have things like dependency relationship information. I'm not sure if that will cause problems. But just take it into consideration.

As for the README, right now it's written for the 0.4 release series and not HEAD. But we're not using HEAD because a Google project like Gerrit is going to want to stay in sync with the internal repo.

jart commented 7 years ago

Change now mailed out internally. It will be approved by your friend Kasper and then pushed to this repository within a few days probably.

davido commented 7 years ago

@jart Thank you.

jart commented 7 years ago

Change now submitted. Here's the config for the latest Closure Rules:

http_archive(
    name = "io_bazel_rules_closure",
    sha256 = "af1f5a31b8306faed9d09a38c8e2c1d6afc4c4a2dada3b5de11cceae8c7f4596",
    strip_prefix = "rules_closure-f68d4b5a55c04ee50a3196590dce1ca8e7dbf438",
    urls = [
        "http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_closure/archive/f68d4b5a55c04ee50a3196590dce1ca8e7dbf438.tar.gz",  # 2017-05-05
        "https://github.com/bazelbuild/rules_closure/archive/f68d4b5a55c04ee50a3196590dce1ca8e7dbf438.tar.gz",
    ],
)

Let me know if there's any other adjustments you need me to make.

davido commented 7 years ago

If I read f68d4b5a55c04ee50a3196590dce1ca8e7dbf438 correctly, it would allow clients to consume external dependency through java_import_external without adjusting the path from @foo//jar to @foo.

However, f68d4b5a55c04ee50a3196590dce1ca8e7dbf438 would still not fix the incompatibility between maven_jar and java_import_external, because when the external dependencies are consumed with maven_jar, then they are exposed only (!) under the @foo//jar path, whereas rules_closure is still expecting them to show up in its build process under the @foo path.

I see two ways to resolve this incomatibility:

I went ahead and extended our custom maven_jar rule with this first suggested approach above. I've sent this CL: [1] to Kasper and other Gerrit build tool chain reviewers.

The [1] allows gerrit project to avoid using java_import_external but still reuse common dependencies with rules_closure rules:

[...]
closure_repositories(
    omit_aopalliance = True,
    omit_args4j = True,
    omit_javax_inject = True,
)

[...]
maven_jar(
    name = "javax_inject",
    artifact = "javax.inject:javax.inject:1",
    sha1 = "6975da39a7040257bd51d21a231b76c915872d38",
)
[...]

Needless to say, that other maven_jar incarnation, most notably the native and the Skylark ones, not affected by this fix and would still fail to interact with rules_closure.

It worth noting though, that the transitive dependencies feature of java_import_external, by that I mean passing deps list to it, is another serious problem that would make solving the interoperability problem between maven_jar, that is not aware of such depedencies and java_import_external really hard problem to solve.

That why, even with [1] in place gerrit project can currently only reuse 3 external dependencies:

Another complication is different naming convention between gerrit an rules_closure: simple names vs. canonical artifact name. For example jsr305 cannot be re-used:

[1] https://gerrit-review.googlesource.com/105970

jart commented 7 years ago

You had me super confused when you were talking about maven_jar. Because I think of native.maven_jar, but it seems Gerrit has a Skylark maven_jar.

It's impossible to be fully compatible with native.maven_jar for the precise reason you concluded. Definitions written using java_import_external only specify direct dependencies, just like java_library and java_import do. Even if the naming issues were worked out, the dependency graph still wouldn't exist.

Since you're adding a root directory alias to Gerrit's maven_jar, the only difference now between that and java_import_external is you're using a Python downloader script from your Buck days, because Buck never had a downloader.

Here's my sales pitch for java_import_external. You get to use the most advanced downloader that exists. The cost is a slightly more verbose config. In exchange you gain the following advantages:

Those are all things that curl currently isn't doing for you. Out of curiosity, are you using something like Squid, and if not, how often does your CI system flake due to download errors? You can trade that in for lightning fast performance and >99.9% reliability.

Literally nothing is better at downloading files than Bazel.

hanwen commented 7 years ago

the custom maven_jar rule was added to have a cross-project cache for downloaded files. It's possible this is no longer needed.

davido commented 7 years ago

Actually, it's more than that:

[...]
 maven_jar = repository_rule(
     attrs = {
         "artifact": attr.string(mandatory = True),
         "sha1": attr.string(),
         "src_sha1": attr.string(),
         "_download_script": attr.label(default = Label("//tools:download_file.py")),
         "repository": attr.string(default = MAVEN_CENTRAL),
         "attach_source": attr.bool(default = True),
         "unsign": attr.bool(default = False),
         "deps": attr.string_list(),
         "exports": attr.string_list(),
         "exclude": attr.string_list(),
     },
     local = True,
     implementation = _maven_jar_impl,
 )

Beside cross-project and cross-clone cache, it also offers the following feature:

# TODO(davido): Remove exlusion of file system provider, when this issue is fixed:
# https://issues.apache.org/jira/browse/SSHD-736
maven_jar(
    name = "sshd",
    artifact = "org.apache.sshd:sshd-core:1.4.0",
    exclude = ["META-INF/services/java.nio.file.spi.FileSystemProvider"],
    sha1 = "c8f3d7457fc9979d1b9ec319f0229b89793c8e56",
)

Still @jart main point in last comment was to replace curlbased downloading with Bazel own download function. I think it should be doable. We would still need to preserve the current features that our custom version of maven_jar is offering.

jart commented 7 years ago

That is correct. Using redundant URLs with a GCS bucket you control means 99.9% reliability both for yourselves and your users with zero Squid config. So long as you don't mind paying for the bandwidth of your CI hitting the GCS bucket. It's a very tiny cost compared to SWE hours when builds break.

@davido java_import_external supports neverlink and srcjar, just like java_import. Our internal tool (soon to be external) also generates configs for it. As for $HOME/.m2 I'm assuming that's being used as a cache. Consider the new downloader to a zero-configuration alternative.

davido commented 7 years ago

So long as you don't mind paying for the bandwidth of your CI hitting the GCS bucket. It's a very tiny cost compared to SWE hours when builds break.

The most often CI breakages that we have seen in the past were due to outage of GH. Our Polymer tool chain is depending on GH 100% availability. AFAICT, there is no mirror NPM repository for that on GCS bucket.

Our internal tool (soon to be external) also generates configs for it.

Cool and thanks for pointing that out. Can you ping me, when you release that tool? I would like to have a look.

As for $HOME/.m2 I'm assuming that's being used as a cache.

Actually we are using it for short circuit upgrading dependent libraries during Gerrit development process. Gerrit has more than 150 dependent libraries. The upgrade process is often not trivial. As example, I am regularly building the libraries from head, like gwtorm, JGit, Gitiles, Prolog Caffee, SSHD, GWT, to name a few, to prepare the upgrade. The process is, fetch the library from head, build (and patch it if needed), then publish the built/patched artifact to $HOME/.m2by using mvn install and tweak the Gerrit build process to consume patched library from local maven repository, by saying:

maven_jar(
   name = 'gwtorm',
   artifact = 'gwtorm:gwtorm:42',
   repository = MAVEN_LOCAL,
 )

This process is even documented in dev-bazel documentation upstream.

It worth noting through, that at least for JGIt that is a project under Eclipse foundation and thus must be built and released using Maven toolchain, we have added Bazel driven build and support alternatively consuming the locally built JGIt artifacts from the source by using Bazel's local_repository rule. In ideal world, all dependencies would be built with Bazel that we could consume them from the source and avoid local Maven polishing step in the first place.