bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.69k stars 3.98k forks source link

java_library() needs provided_deps attribute #1402

Closed spearce closed 2 years ago

spearce commented 8 years ago

As discussed in https://gerrit-review.googlesource.com/77946 the java_library() rule needs to support a "provided_deps" or "compile_deps" attribute that does not include the provided libraries into the transitive dependency closure when the rule is used in another rule. E.g.:

java_binary(name = "app", deps=[":lib"])
java_library(name = "lib", deps="[//third_party:foo"], provided_deps=["//third_party:ssl"])

When building app, it includes //third_party:foo, but not //third_party:ssl. However lib is able to see //third_party:ssl symbols and its sources can compile against those.

ulfjack commented 8 years ago

One problem with that is that the entire transitive closure of your binary has to do that consistently. If there is any library that has a normal deps on that, it will be linked in after all.

With neverlink, it's easier to have consistency, because the annotation is on the rule, not the edge.

Both proposals (neverlink and compile_deps) break down if you sometimes want to link a library and sometimes you don't want to link it, for example, because you have multiple binaries that you want to deploy to different environments that provide different libraries out of the box. Both proposals encourage duplicating the dependency graph in that case, which is very costly.

Unfortunately, it's not clear from the feature request why you'd want to have such a feature and how you'd use it, making it harder for us to come up with a workable proposal.

Another possible solution would be to use neverlink in combination with select. That'd allow you to say on the build command-line whether you want a certain library linked into a binary or not. However, the downside is that you can't build two libraries differently in the same build (at least, not right now), though we're working on that.

Yet another solution is the deploy_env attribute on java_binary, that we have internally. It's specified at the java_binary rule, and it allows performing classpath subtraction at the binary level. I.e., it takes the usual runtime classpath, and also computes a runtime classpath for the things referenced through the deploy_env attribute, and then builds a binary that only contains those classpath entries on the first classpath that aren't on the second. That avoids most of the problems above - you can have multiple binaries with different environments, in the same build, and it's guaranteed to be consistent for each binary. The downsides are that you have to specify it on each binary, which could be expensive (or easy to forget) if you have lots of binaries that want to use the same deploy_env. It could potentially also slow down builds. Also, you have to avoid any manual munging of classpath jars - if you post-process parts of the classpath with a genrule or by piping it through a java_binary deploy_jar, then the classpath entries don't match and the subtraction doesn't do what you might have expected.

We could easily add the depoly_env attribute to java_binary in Bazel, since all the code already exists. I'm not sure why this specific attribute was left out when we open sourced the Java rules.

spearce commented 8 years ago

For reference https://github.com/facebook/buck/issues/63 is the Buck feature request.

This is actually a very important feature for some Java "applications". Stepping back from a lot of the discussion around Gerrit, look at a Java servlet "application". When writing a servlet to be deployed in a servlet container my sources must compile against the servlet spec JAR, but must not include the servlet spec jar in its WEB-INF/lib directory.

To compile and deploy a Java servlet, you must use a compile_deps/provided_deps type of strategy in the build system to ensure this particular dependency is visible at compile time for some sources, but is not caught into the transitive dependency graph when assembling all other necessary libraries for the application to run.

Cryptography libraries also can fall into this category. E.g. in Gerrit we "work with" BouncyCastle if the end-user downloads and installs it next to Gerrit. If it is not present, we gracefully degrade and support a reduced set of functionality that is still useful. For simplicity in development, some source classes are compiled against BouncyCastle APIs, but we break the dependency graph to prevent BC from going into the release binary.

I guess both of those cases might be workable by deploy_env telling the java_binary rule that it provides those things, and therefore should not be included in the transitive closure output. A downside of this approach is it forces us to construct a java_binary for something that maybe should be a java_library instead. E.g. when we package Gerrit, we don't shade everything into a single flat JAR; we collect the dependencies into unique JAR names under WEB-INF/lib and let the container deal with a pile of (mostly original) JARs in the classpath, rather than a single flattened JAR.

Its also awkward for Gerrit plugin developers to specify the Gerrit API they are building against twice; once in the java_library as a dep and again in their java_binary as a deploy_env to subtract out the very thing they had to add in as a dependency. Its somewhat cleaner to add the dep just once in the library target as a known part of the runtime environment and not have the build system force you to document it twice.

If you look at the rest of the Java build tools space, I think every other system either has this, or has had to add it after the fact, because there are just too many corner cases where you need to clip the transitive dependency edge.

ulfjack commented 8 years ago

java_binary does not generate a single flattened JAR by default, only if you request the _deploy.jar implicit output. However, it doesn't generate a WEB-INF/lib layout, but I have seen Skylark rules that do.

For the gerrit plugin use case, (something like) neverlink seems like a better solution. You're suggesting to write something like this:

java_library(name = "my_plugin", srcs = ["Foo.java"], compile_deps = ["gerrit-api"])

With neverlink, you'd instead write something like this:

java_library(name = "my_plugin", srcs = ["Foo.java"], deps = ["gerrit-api"])

And the gerrit-api target would specify neverlink (though note that neverlink currently only applies to a specific library, not to all libraries reachable through that library).

This seems better because a) you can't accidentally get it wrong, and b) it's easier to automate with tools.

Don't get me wrong, I'm not against adding more mechanisms to the Java rules to support such use cases, I'm just pointing out the advantages and disadvantages of different approaches. I don't think copying what someone else did is necessarily the right approach, but it's also not necessarily wrong. I prefer to make decisions based on technical arguments, not on the say-so from someone else.

spearce commented 8 years ago

So if we write:

java_library(name="gerrit-api", neverlink=1, deps=["//third-party:servlet-api", ":other-lib"])

and servlet-api and other-lib have neverlink=0, a "my_plugin" does not include gerrit-api, but does include servlet-api and other-lib?

I do see the value in neverlink for this plugin-api case. Its certainly easier for the my_plugin author.

damienmg commented 8 years ago

Ok. We discussed offline and that sounds reasonable to cover that use case. Repriotized accordingly. Hopefully we will get manpower to do it earlier.

arjantop commented 6 years ago

Is this still going to be implemented?

hmemcpy commented 5 years ago

I'd like to add another few use-cases where we need to be able to exclude specific targets from the final deployable jar. For example, we have a situation where we need to use our library with different runtimes, e.g. Hadoop. Some of those environments need to have a specific Jackson binary, others don't. This is why the neverlink solution doesn't work for us, as we need to have it excluded sometimes.

Reading this thread, the deploy_env may be a good solution, unfortunately, as @ulfjack mentioned, it's an internal feature with no documentation to speak of.

Our options right now are to either create a genrule that would crack open the produced deployable, and manually remove the packages we'd like to remove (or, use @johnynek's jarjar wrapper and use the zap rule for this purpose). This works, but feels non-bazelish, as we don't deal in the label/target level anymore, and have to manually provide a list of packages to remove.

Another option is to use another top-level rule that would perform this classpath subtraction manually, then delegate it to the SingleJar utility that handles the rest. So instead of creating our own e.g.

bazel_assembly(
    name = "my-output-jar"
    for-target="//foo/bar/mybinary"
    exclude=["//com/quux/dependency"]
)

it would be really useful to have the ability to specify exclude_for_deployment list of targets on the java_binary level.

cc @ittaiz

ittaiz commented 5 years ago

I think in addition to the style issue (non bazelish) then there are two major challenges with using the solutions @hmemcpy describes:

  1. People usually filter out jars which exist in the runtime and not classes/packages. Choosing the available approach means people need to “translate” between the paradigms.
  2. If someone upgrades a dependency and the dependency changes or adds packages then those might sleep in making upgrades more brittle.

@ulfjack, Wdyt about the exclude_for_deployment attribute?

On Thu, 19 Jul 2018 at 10:23 Igal Tabachnik notifications@github.com wrote:

I'd like to add another few use-cases where we need to be able to exclude specific targets from the final deployable jar. For example, we have a situation where we need to use our library with different runtimes, e.g. Hadoop. Some of those environments need to have a specific Jackson binary, others don't. This is why the neverlink solution doesn't work for us, as we need to have it excluded sometimes.

Reading this thread, the deploy_env may be a good solution, unfortunately, as @ulfjack https://github.com/ulfjack mentioned, it's an internal feature with no documentation to speak of.

Our options right now are to either create a genrule that would crack open the produced deployable, and manually remove the packages we'd like to remove (or, use @johnynek https://github.com/johnynek's jarjar wrapper and use the zap rule for this purpose). This works, but feels non-bazelish, as we don't deal in the label/target level anymore, and have to manually provide a list of packages to remove.

Another option is to use another top-level rule that would perform this classpath subtraction manually, then delegate it to the SingleJar utility that handles the rest. So instead of creating our own e.g.

scala_assembly( name = "my-output-jar" for-target="//foo/bar/mybinary" exclude=["//com/quux/dependency"] )

it would be really useful to have the ability to specify exclude_for_deployment list of targets on the java_binary level.

cc @ittaiz https://github.com/ittaiz

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/1402#issuecomment-406180480, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF2DC0-fL5B0_8nfkRKyo05oOdi1vks5uIDPrgaJpZM4I1iJM .

ulfjack commented 5 years ago

@ittaiz how's "exclude_for_deployment" different from the existing "deploy_env" attribute? It looks like the attribute isn't defined in Bazel, but the code is otherwise all there. I'm not sure who decided to exclude "deploy_env" from Bazel, but I generally think it's a mistake to introduce divergence between the rules.

ittaiz commented 5 years ago

It may be enough. I couldn’t find it in Bazel. Can I give it a list of labels and then the contents of the related jars are filtered from the deploy jar? If so that’s exactly what we want and we’ll be happy to use it. On Fri, 20 Jul 2018 at 13:19 Ulf Adams notifications@github.com wrote:

@ittaiz https://github.com/ittaiz how's "exclude_for_deployment" different from the existing "deploy_env" attribute? It looks like the attribute isn't defined in Bazel, but the code is otherwise all there. I'm not sure who decided to exclude "deploy_env" from Bazel, but I generally think it's a mistake to introduce divergence between the rules.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/1402#issuecomment-406558218, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIF_czyTS5DRrbLOe3Qx5gBNRiaDjwks5uIa6ygaJpZM4I1iJM .

ulfjack commented 5 years ago

It's filtering by jar file, it does not unpack jars and filter their contents.

ittaiz commented 5 years ago

I was unclear. Does the attribute receive a list of labels? Assuming it is any idea what’s the diff for it to be in bazel open source? On Fri, 20 Jul 2018 at 14:15 Ulf Adams notifications@github.com wrote:

It's filtering by jar file, it does not unpack jars and filter their contents.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/1402#issuecomment-406570970, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUIFwNYVuBzGpI1ZhVh_yf7xh7eJ4PEks5uIbvNgaJpZM4I1iJM .

ulfjack commented 5 years ago

See here: https://github.com/ulfjack/bazel/commit/146e52c28a71d49d56a5f1d832b6c16b6d9dcec1

I haven't tried this (not even to compile), but my earlier code digging says it should just work.

hmemcpy commented 5 years ago

Hi @ulfjack,

I'm trying to investigate whether deploy_env can suit our needs, but I can't seem to make it work. I've applied the change your commit described, but it seems to be failing to find the classpath providers here: https://github.com/bazelbuild/bazel/blob/4a2002043ed3907223a403e8b8fc66975e516fd8/src/main/java/com/google/devtools/build/lib/rules/java/JavaBinary.java#L121-L122

This returns an empty list.

Is there anything I'm missing? How can I configure deploy_env to have JavaRuntimeClasspathProvider on it?

ulfjack commented 5 years ago

Are you referencing a java_binary in deploy_env?

hmemcpy commented 5 years ago

Oh wow, my bad! Yes, it works with java_binary targets! I'll report back whether it suits our needs. Thanks!

hmemcpy commented 5 years ago

Ok, just to make I'm doing this correctly. Suppose I have a binary target like this:

java_binary(
    name = "test",
    srcs = ["Program.java"],
    main_class = "test.Program",
    runtime_deps = ["//test/lib1:lib1", "//test/lib2:lib2"],
    ...

My goal is to produce a deployable without the reference tolib2. I was able to achieve it by adding another java_binary which just references lib2, and add this new binary in deploy_env:

java_binary(
    name = "test_deploy",
    main_class = "test.prog1.Program",
    runtime_deps = ["//test/lib2:lib2"]
)

and adding deploy_env = [":test_deploy"] to the original java_binary target.

After building test_deploy.jar, I have a jar that only contains the lib1 files!

However, it seems this will subtract all the transitive dependencies of lib2 as well, am I correct to assume that?

ulfjack commented 5 years ago

Yes, that's right, unless you pass through a non-Java rule, like so:

java_binary(name="foo", runtime_deps=["baz","original"], deploy_env=["bar"])
java_binary(name="bar", deps=["baz"])
java_import(name="baz", jars=["copy.jar"])
genrule(cmd = "cp original.jar copy.jar") <--- renames the jar file
java_library(name = "original")

In this case, original.jar ends up on the classpath, even though it is in the transitive closure of the deploy_env.

statik commented 5 years ago

I'm excited to find this issue, this is exactly what I need in order to use bazel to build jars that are plugins for a keycloak.org authentication server.

hmemcpy commented 5 years ago

@ulfjack the deploy_env feature could really help us in migrating difficult maven projects. Would you make this available in bazel?

rdnetto4 commented 5 years ago

Note that another common use case for provided scope is OSGI boundaries - given some library A, a library B loaded within the same classloader should have a regular dependency on it, whereas a library C loaded in another classloader that OSGI-imports classes from A should have a provided-scope dependency on it.

s50600822 commented 5 years ago

One problem with that is that the entire transitive closure of your binary has to do that consistently. If there is any library that has a normal deps on that, it will be linked in after all.

Hi, I'm trying to see how to add this, but an example is if you move Guava into Bazel https://github.com/google/guava/issues/2850

under certain WORKSPACE that contains other packages. Guava itself(guava-*.jar) is an OSGi bundle. Image if there are several other packages in the workspace that are OSGi bundles, and they depend on Guava traditionally as "provided". Now if we say "neverlink" here (and do the OSGi instruction in the MANIFEST somehow). We won't be able to have any other BAZEL package that acts as application/platform that include Guava as compile dependency and do OSGi export for every other packages in the WORKSPACE. I think this is a very common use case for OSGi projects.

Though I'm not totally sure if this rule should be under java_library or there should be an osgi_bundle rule by itself.

s50600822 commented 5 years ago

Hi

I wonder if I misunderstood something regarding the scope semantic of "deps", below I is when I tried to compile and saw somehow "deps" is already acting like "provided". java-maven-lib came out without guava or java-blah-lib inside the package. Similarly java-blah-lib was built without guava or common-io inside it. screen shot 2018-10-23 at 9 36 40 am

thundergolfer commented 5 years ago

@s50600822 Did you build with the _deploy.jar suffix?

s50600822 commented 5 years ago

@s50600822 Did you build with the _deploy.jar suffix?

I don't think so. The jar name above (in the decompiler) also seems to say that too (cause it's been a while). If you meant https://docs.bazel.build/versions/master/tutorial/java.html#package-a-java-target-for-deployment, I wouldn't think of it since java application deployment is something usually not covered by build tool(since the logic isn't trivial). For example, in Maven the "deploy" semantic means adding the artifact to a remote repository(https://maven.apache.org/plugins/maven-deploy-plugin/). I think Bazel strength is incremental build behavior and was trying to use it to that extend.

thundergolfer commented 5 years ago

Well if you don't create a 'Fat Jar' with the _deploy.jar target suffix, then you won't get your dependencies inside your Jar.

gergelyfabian commented 4 years ago

Does deploy_env help in a case when you have a diamond dependency problem:

A depends on B and C, B depends on C (possibly in a longer transitive chain), and I'd like to use deploy_env to remove from the *_deploy.jar B and it's transitive dependencies, but not C (because A depends on C directly).

According to my memories regular Maven supported this case, marking B as provided, but using C as a normal dependency. With what I can see with deploy_env (testing it), in this case it's not possible to use C from A. Is this a bug, that I should report, or it's something expected?

Example:

Building software that would depend on scala.util.parsing (direct dependency) on a Hadoop platform. If I start including elements of the Hadoop platform in the java_binary that I use as deploy_env, then I cannot use scala.util.parsing any more in my software (I can compile, but I cannot run it, because scala.util.parsing gets cut out - as it's frequently a direct or transitive dependency of elements of the Hadoop platform).

My gut feeling would be, that if I have a direct dependency, it should not be cut out with deploy_env.

comius commented 2 years ago

java_binary deploy_env should be used for this use case