Fully-featured Java dependency management

ryandm commented 8 years ago

Here's a proposed scheme for handling 3rd-party dependencies in Buck, starting from the perspective of Java dependency management. The main goal is a complete, sustainable solution for the dependency management needs of Buck Java projects.

Looking for feedback on the tradeoffs involved. CC @Coneko @shs96c @mikekap @davido @bolinfest @bestander.

Java dependency management requirements

Java code that's built with Buck should use the exact same version of every third-party dependency. So most of the time, dependency management is irrelevant. However it becomes critical when importing third-party dependencies into the source tree.

For Buck projects, here are some Java dependency management requirements observed in practice, when importing libraries from Maven repos:

Resolution: Given a list of direct dependencies, first resolve transitive dependencies, then write corresponding prebuilt_jar rules and inter-dependencies into the source tree at third-party/java/**/BUCK
Optionals: For example, a Buck rule for a direct dependency should never transitively pull in slf4j-simple, log4j-over-slf4j, and other "logging implementation" JARs. Only the final java_binary should choose what to use.
Relocations: For example, org.apache.commons:commons-io has been relocated to commons-io:commons-io. Everything should still work as expected when relocations come into play.
Banned Dependencies: Assert that certain known-to-be-bad dependencies haven't been pulled in directly nor transitively.
Banned SNAPSHOTs: Assert that -SNAPSHOT dependencies (which are mutable) haven't been pulled in directly nor transitively.
Require Sane Versions: Ensure that if a dependency appears multiple times in the graph, the latest version of that dependency is imported. Possibly, also assert that the versions appearing throughout the graph don't span multiple major-version numbers.
Source JARs: Sources for all transitive dependencies must be available in your IDE so you can click into the source code during coding and debugging.
Exclusions: In rare cases where something we want to depend on declares faulty or unnecessary dependencies, we use exclusions as an escape hatch. Exclusions are almost never sane, but sometimes pragmatically helpful as a stopgap or a last resort.
Duplicate Classes: Assert that there are no duplicate classes in all libraries that are transitively used by the project.
Allow remote_file instead of prebuilt_jar: While prebuilt_jar should likely be the default way to import libraries, in some cases, remote_file (with download.in_build = true in the Buck config) is preferable in context.
Documentation and Usability: The mental model for "how dependencies work" needs to be thoroughly documented, along with "how to get things done" in practical situations. We also need tools to visualize what happens during dependency resolution.
Proposal: Reuse package management

The dependency management requirements, above, lead to a choice:

Reuse existing package management and mitigate known problems.
Reinvent package management and solve known problems.

I propose reusing existing package management:

Main downside: a second tool (e.g. mvnw or gradlew) is needed for dependencies
Main upside: existing and future work is offloaded so we won't have to deal with it

When choosing to reuse existing package management, our only task is to define and document a sane mapping from the package manager's model onto Buck's model. The upside becomes compelling, I think, when evaluating this decision as a potential general pattern to apply to Buck integration of all package managers across all languages.

The following section vaguely sketches what user-facing docs might look like when taking the "reuse" approach with Maven as the package manager. (Maven's just an example — I expect the general approach to work similarly with Gradle or Ivy.)

Example Docs: Importing dependencies via Maven POM

Run these commands to get a template POM whose contents can be edited to specify your dependencies. The template POM's comments describe how each section in the POM affects the Buck rules that'll be generated.

$ cd "$(buck root)"
$ mkdir -p third-party/java
$ cd third-party/java
$ mvn archetype:generate                  \
  -DarchetypeGroupId=com.buckbuild        \
  -DarchetypeArtifactId=third-party-java  \
  -DarchetypeVersion=1.0                  \
  -DgroupId=com.example.yourGroupId

Modify the resulting pom.xml in your editor or IDE. For example, to add the Guava library:

<project>
  ...
  <dependencies>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>19.0</version>
    </dependency>
    ...
  </dependencies>
  ...
</project>

Every time the POM is edited, an explicit command must be run to regenerate the third-party/java build files.

$ cd third-party/java
$ mvn verify buck:regenerate-build-files

The resulting file at third-party/java/com.google.guava/BUCK will make the Guava library available to your project. Elsewhere in your project, you can depend on Guava like this:

java_binary(
  name = 'example-rule-that-depends-on-guava',
  deps = [
    '//third-party/java/com.google.guava:guava',
  ],
)

Next steps

I've investigated this issue enough that I understand how Maven's package management can be sanely mapped to Buck rules to meet the requirements described above, but I didn't describe that mapping here because first we need a higher-level discussion about the tradeoffs in the overall approach.

After understanding the tradeoffs in the high-level approach, the next step is to refine and code the logic needed to fully meet Buck's Java dependency management requirements.

mikekap commented 8 years ago

Sounds like a great approach. In case you haven't seen it, buck has a maven importer already that can import pom.xml files (though probably not in a 100% foolproof way). You can run it via buck run maven-importer in the buck repo. The source code is at https://github.com/facebook/buck/blob/master/src/com/facebook/buck/maven/Resolver.java . You may want to mention how far off that is from what you'd like to see.

ryandm commented 8 years ago

Good point - I missed that the importer could read pom. From an initial look at the importer:

Done: Resolution, Relocations, Source JARs, Exclusions
Not yet done: Optionals, Remote_file, Banned SNAPSHOTs, Banned Dependencies, Require Sane Versions, Duplicate Classes, Documentation and Usability

By running mvn verify buck:regenerate-build-files (or something comparable using Gradle or Ivy) instead of buck run maven-importer, I think we gain:

natural use of already-built plugins for Banned SNAPSHOTs, Banned Dependencies, Require Sane Versions, Duplicate Classes (and other stuff that didn't make the short list)
an established mental model (pom.xml) that's fully functional (e.g. <parent>, <dependencyManagement>, <properties>)
pre-existing documentation / stackoverflow questions / user familiarity

The real win I see is the careful scoping of our problem by saying "the only thing we support is writing Buck rules for a well-defined subset of this package manager's model." Simultaneously we get to check off the list of requirements by cleanly delegating to (and placing responsibility on) Maven/Gradle/Ivy for the long tail of feature requests and usability issues.

Coneko commented 8 years ago

cc @kageiit, @grumpyjames

kageiit commented 8 years ago

Thanks for tagging @Coneko

@ryandm @mikekap We have built a dependency resolution/importing mechanism and mapping to Buck build model using gradle in https://github.com/OkBuilds/OkBuck

As @mikekap mentioned, we chose this approach because the gradle dependency model declaration has all the features listed in the requirements. It also has resolution strategy hooks to do advanced dependency conflict management etc. And in general, this is already a solved problem by other systems for several years and we wanted to take advantage of whats already built rather than reinventing the wheel.

This was natural for us because the OkBuck gradle plugin converts java and android based projects to buck projects. The workflow with the Gradle Dependency Management and how it is mapped to Buck's model is as follows:

Dependencies are defined in build.gradle files in the dependencies block

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    compile(name:'rxscreenshotdetector-release', ext:'aar')
    compile 'io.reactivex:rxjava:1.1.0'
    compile ('io.reactivex:rxandroid:1.1.0') {
        exclude module: 'rxjava'
    }
    compile 'com.tbruyelle.rxpermissions:rxpermissions:0.5.2@aar'

    compile 'com.android.support:multidex:1.0.1'
    compile 'com.jakewharton:butterknife:8.4.0'
    apt 'com.jakewharton:butterknife-compiler:8.4.0'
}

The representation is very succinct and highly usable/readable.

A gradle task is run that resolves the various configurations and forces gradle do do its dependency resolution and downloading/caching the jars/aars and source jars to its local cache.
The final resolved versions of the dependencies are copied over into a project local dependency cache directory. The gradle task generates a BUCK file in the cache directory that maps all the artifacts and gives meaningful rule names of the prebuilt_jar and prebuilt_android_aar kind.
These rules can now be used in other BUCK files as usual

deps = [
        '//.okbuck/cache:com.android.support.animated-vector-drawable-24.1.1.aar',
        '//.okbuck/cache:com.android.support.appcompat-v7-24.1.1.aar',
        '//.okbuck/cache:com.android.support.support-annotations-24.1.1.jar',
        '//.okbuck/cache:com.android.support.support-v4-24.1.1.aar',
        '//.okbuck/cache:com.android.support.support-vector-drawable-24.1.1.aar',
    ]

We have gone a step further and created a wrapper script around buck that detects when changes to build.gradle files are made and automatically invokes the dependency fetch and resolution mechanism before delegating the build/test commands back to buck again i.e something like

# Change a dependency in build.gradle
./buckw build target
# buckw invokes the gradle task, then proceeds with buck

We are able to do the wrapper script because OkBuck also automatically generates the BUCK files for all java/android gradle projects.

But I think for the purpose of dependency management itself, it would be very easy to utilize what we have built already in OkBuck using gradle and tweak it to serve additional needs if any.

davido commented 8 years ago

I'm not sure I would like to interact with third party build tool chains, like Maven and Gradle (we already need Ant to compile Buck). However, I also think that dependency management in Buck should be improved. But before thinking to introduce something new, I would concentrate on fixing something that is already there: remote_file and buck fetch. There are known issues with it: [1], basically the same, like with maven_jar in Bazel: [2]:

Buck version upgrade invalidate the cache
rm -rf buck-out remove the world
Subsequent clones of the same project in different location (stable branch) and re-build would re-download the whole world again

FWIW: Bazel team is re-implementing native maven_jar rule as dependency aware Skylark rule [3], using Maven's maven-dependency-plugin behind the scenes.

[1] #602 [2] https://github.com/bazelbuild/bazel/issues/1752 [3] https://bazel-review.googlesource.com/5770

kageiit commented 8 years ago

All the features you listed like dependency jar caching are currently available in the gradle system (and probably also in maven). Gradle itself is wrapped in the wrapper which will download and install gradle if its not available, so its not really a hard dependency for the end user. It would all be behind the scenes

Whats your rationale for not wanting to interact with third party build tool chains? I dont see any solid arguments against it, just that you want to reimplement it again in buck. This has greater chance of adding more risk for a small part of the system just used for dependency management.

Buck relies on several third party tools already to power itself. How would this be any different?

davido commented 8 years ago

Whats your rationale for not wanting to interact with third party build tool chains?

My point is: you would need to know and interact with yet another tool chain. For other needs, like transitively download non Java artifacts, like say bower_components you would need yet another toolchain (Node, bower, ...).

Blaze (Bazel), Buck and Pants all use Python for writing build files. What do you need more? My expectation is, that I can just say:

  maven_jar(
    name = 'foo',
    artifact = 'bar:baz:qux-1.0',
    sha1_bin = '42',
    sha1_src = '43',
    transitive = True # per default False, obviously
  )

Done. Once downloaded on my local machine (of course this should respect proxies to work in enterprise environment) the artifact should survive Buck upgrade, rm -rf buck-out and downloaded artifact should be re-used when the same projects is cloned multiple times on the same machine.

Is this too much to expect, that for the requirements above i would like not need to install, configure, not to mention interact (by that I mean edit) with any other build toolchains, like build.xml, pom.xml and build.gradle.

One important thing to notice here, that the reason, why Google and Facebook havn't provided until today non broken by design remote_file/maven_jar and buck fetch / bazel fetch implementations out of the box in their build tool chains is because they use monorepo with all third party dependency checked in themself. So, they just don't have this problem. That is not WAI. I'm not adding any dependencies to repo and that why I rely on sane dependency management. (I even agree to resolve transitive dependency chain for my deps on my own.)

FWIW, the best known maven_jar Bucklet implementation in native Buck that provides all feature mentioned above, except handling transitive dependencies, is provided in Gerrit build tool chain: [1].

[1] https://github.com/davido/bucklets/blob/master/maven_jar.bucklet

ryandm commented 8 years ago

The way I see it, the upside of interacting with a third-party toolchain is getting to steal that mental model (and ecosystem) for free. The downside is it's only free for someone already familiar with thinking in terms of Maven/Gradle/Ivy for dependencies.

The goal here is a complete, sustainable solution for Java dependency management needs.

Let's do that without adding a new toolchain, if we can. But so far, I don't see how we can.
Tackling #602 is clearly a requirement for this goal; @davido thanks for re-raising that issue.

OkBuck's approach to dependencies is in a similar vein, except of course that the description up top calls for materializing BUCK files into the tree and then using vanilla buck. Other than feature completeness, the main outcome I'd like to see is a clear recommendation in Buck docs for how to import Java 3rd-party libraries.

(still gotta check out the bazel/maven_jar/skylark stuff)

kageiit commented 8 years ago

Not handling transitive dependencies is a huge downside for any decent dependency management system. Gradle built their integrations on top of maven (all of these were written in java and are well maintained and have been stable for many years).

For other needs, like transitively download non Java artifacts, like say bower_components you would need yet another toolchain (Node, bower, ...).

I do not see this as a downside at all. I would argue it is better that way because developers in any ecosystem using the tools would be most familiar with those tools already and would find it easy to adopt buck without having to learn/migrate to a new model for their dependency management. Also, this issue is purely dealing with java dependency management.

maven_jar( name = 'foo', artifact = 'bar:baz:qux-1.0', sha1_bin = '42', sha1_src = '43', transitive = True # per default False, obviously )

There is no need to actually define a build.gradle file to take advantage of gradle for this. The artifact rule you described above can be translated internally to a model gradle/maven understand and use their core logic to do the rest.

Once downloaded on my local machine (of course this should respect proxies to work in enterprise environment) the artifact should survive Buck upgrade, rm -rf buck-out and downloaded artifact should be re-used when the same projects is cloned multiple times on the same machine.

These are already well solved problems under all the constraints you just described and buck can benefit a lot by taking advantage of them.

Is this too much to expect, that for the requirements above i would like not need to install, configure, not to mention interact (by that I mean edit) with any other build toolchains, like build.xml, pom.xml and build.gradle.

As I mentioned before, you do not need to. The mapping can be done internally, but then again you would need to learn about whatever way is supported at least once anyway. To drive adoption of buck in the real world among more developers, it is great to adhere to existing standards (unless they are completely broken), otherwise it just ends up being a closed system with a high cost to learn about and buy into to use buck.

standards

except handling transitive dependencies

I think this makes it not very useful as it is almost similar to a vanilla buck fetch with some scripting to figure out where the jar/aar lives on maven central. Not to mention, it has no real logic to handle complex cases like excluding selective artifacts from transitive dependencies etc. the likes of maven/gradle support. When working in a non monorepo situation, these sort of features are absolutely necessary for projects that use dependency management

davido commented 8 years ago

There is no need to actually define a build.gradle file to take advantage of gradle for this. The artifact rule you described above can be translated internally to a model gradle/maven understand and use their core logic to do the rest.

I agree. I wouldn't care what Buck is using behind the scenes, as far as I don't have to interact with any specifics of third party tool chains. Say edit (or even see) pom.xml and build.gradle.

except handling transitive dependencies

I think this makes it not very useful as it is almost similar to a vanilla buck fetch with some scripting to figure out where the jar/aar lives on maven central.

Yes. That why I said, that dependency management should be improved in Buck itself.

ryandm commented 8 years ago

Suppose we add a transitive = True flag. How do we satisfy Buck's core need for reproducibility? From Buck's perspective, the only way to make the guarantee is to commit something (e.g. a checksum) into the source tree. Unless I'm missing a creative escape hatch, any kind of transitivity implies running a separate command to resolve dependencies before the actual build.

Hiding Maven/Gradle/Ivy as an implementation detail is tempting... whether this can be done without creating a leaky abstraction is a judgment call. Based on evaluating this for Maven (with prototyping) I currently think that such an abstraction will either leak, or effectively reinvent dependency management. The biggest issue is how to cleanly surface all failures.

What should Buck implement/recommend as the current approach for managing Java dependencies?

Is there an approach that's more complete+supportable than mvn/gradle plugins that write BUCK files for resolved dependencies?

Coneko commented 8 years ago

It seems to me there isn't a big advantage to hiding Maven: many of the features would have a very specific mapping to the Maven features, and package managers for other languages might have different ways of implementing those features, or not having them at all.

Supporting everything Maven supports would just mean all the information that can be specified in a Maven file could be specified in Buck's build files and then be used to generate the Maven file. I think that's the approach Bazel takes.

I don't see that as hiding any implementation details, it's just being able to write the information in a python syntax rather than xml.

davido commented 7 years ago

FWIW, transitive dependencies was added to Bazel with this bazlets: [1]. The underlying implementaion uses gradle.

[1] https://github.com/pubref/rules_maven

ryandm commented 6 years ago

(Closing as likely beyond the scope of buck per se.)

facebook / buck