4sh / datamaintain

One tool to maintain all your database schemas!
Apache License 2.0
25 stars 6 forks source link

Gradle rework #218

Closed CLOVIS-AI closed 1 year ago

CLOVIS-AI commented 1 year ago

This PR is a full rework of the Gradle configuration, with the goal of improved maintainability (by reducing duplication) and performance (by enabling modern caching features).

This PR is split into multiple phases.

Phase 0: analysis

Before this PR, Datamaintain used various techniques to share common configuration between projects.

buildSrc: The buildSrc folder is a special Gradle folder in which source files are compiled then injected into all build.gradle files. Because it is common to all Gradle projects, it cannot import any plugins. This makes attempting to configure plugins from it complicated: the implementation function must be replaced by the non-typesafe "implementation" syntax, the publishing block from maven-publish must be accessed as project.configure("publishing"), etc.

Additionally, because the buildSrc directory is imported into build scripts, if it changes, all build scripts must be recompiled. This makes it a poor place to store version numbers: if any of them change (including by switching to another branch), Gradle must start everything from scratch.

buildScripts: Datamaintain had a buildScripts folder, which has no special meaning to Gradle. It contained a .kts file which was compiled and imported into the build scripts. This has the same downsides as buildSrc, with an additional one: since it is a custom file, it is not recognized by IDEA (no code highlighting, etc).

allprojects and subprojects: The allprojects block allows to configure a project and its sub-projects. In Datamaintain, it was only used in the root project. The main downside is that it is implicit: it makes reading build scripts more difficult as any parent build script can edit it. Like buildSrc, it cannot configure plugins which are not applied to itself; so if the root module doesn't have the Kotlin module applied, it cannot configure it, even if all its subprojects do.

Using allprojects and subprojects in any project other than the root has undefined behavior when Configuration on Demand is enabled. However, no such usage existed in Datamaintain.

Phase 1: simplification

First, we update Gradle and Kotlin. The techniques we will be using in this PR are quite recent, and have been stabilized only in the past years.

The first phase of this PR consists of eliminating all the problems above by inlining all configuration options. Of course, this makes the build scripts much larger, as they now duplicate each other, but it removes all impacts they have on each other, making future factorization easier.

Previously, the root project configured all projects' Kotlin plugin to be compatible with Java 8. However, the Java plugin (responsible for running the tests, publishing, etc) had its default configuration of Java 11. This is replaced by explicitly using JVM toolchains, a Gradle feature which allows automatic downloading of the requested JDK version. @Lysoun told me to set it to Java 17.

Finally, we move all dependencies to the Version Catalog, a TOML file which lists all dependencies (versions + Maven coordinates). Because this file is a simple TOML file, it doesn't require recompilation of the build scripts when edited.

Phase 2: factorization

Precompiled scripts plugins are a Gradle feature which allows writing plugins with the same syntax as build scripts. Most often, they are used to factor the default configuration, thus earning the name "convention plugins". To my knowledge, there is no standard place to put them. I personally prefer the convention of putting them in gradle/conventions.

Because they are proper plugins, they can depend on other plugins, and thus access their full configuration. However, they still have the downside of requiring a full recompilation of all build scripts when they are changed. Since all versions are stored in the version catalog, which doesn't have this problem, they rarely change, so it shouldn't have a large impact.

Because they are proper plugins, each project explicitly declares which conventions it follows. This makes it extremely clear what can and cannot impact the configuration, and makes it easy to figure out the configuration differences between projects;

plugins {
    id("datamaintain.conventions.kotlin")
    id("datamaintain.conventions.driver")
}

It is immediately clear that this project only uses the default configuration declared by the kotlin and driver conventions.

Phase 3: performance

The Build Cache allows Gradle to store task results outside the build directory (in ~/.gradle/caches). Because the build directory is overwritten each time the user switches branches, everything needs to be recompiled. Thanks to the build cache, Gradle remembers previous compilation results from other branches. It is also useful when a user clones the project multiple times; each clone benefits from the previous compilations from the others.

By default, Gradle executes all build scripts before starting executing tasks. However, sometimes, this is not necessary: running ./gradlew :modules:cli:run does not require configuring the :modules:driver-jdbc project, since it is not a dependency. Configuration on Demand allows Gradle to only configure projects that participate in the build. The main downside is that it becomes impossible (or worse, undefined behavior) to access the configuration of another project, since it may not be available yet. Thanks to convention plugins, however, each project declares exactly which configuration it needs, so there is no need for projects to configure each other.

Finally, tasks declared with task() or tasks.create are configured even if they are not executed. Tasks declared with tasks.register are only configured if they are part of the task graph.

Possible future work

Samples: This PR doesn't factorize the configuration of the samples. Maybe in the future they could be converted to included builds? However, those are not well supported by IDEA at the moment.

Configuration Cache: The configuration cache allows Gradle to store the results of the configuration for a given execution. For example, after running ./gradlew test, another execution of ./gradlew test entirely skips the configuration phase, and starts executing tests immediately. On my machine, this would save ~0.5s for each Gradle execution, so a ~25% speedup. However, the Git Palantir plugin is not compatible at the moment.

Closes #212, #211 and #183.

cc @Lysoun

Lysoun commented 1 year ago

By default, Gradle executes all build scripts before starting executing tasks. However, sometimes, this is not necessary: running ./gradlew :modules:cli:run does not require configuring the :modules:driver-jdbc project, since it is not a dependency.

I'm hoping that this is just a wrong example because the cli does use the jdbc driver.