Split building from publishing in Gradle tasks (separate build and push)

GoogleContainerTools / jib

🏗 Build container images for your Java applications.

Apache License 2.0

13.68k stars 1.44k forks source link

Split building from publishing in Gradle tasks (separate build and push) #1756

Open remmeier opened 5 years ago

remmeier commented 5 years ago

Description of the issue:

jib, jibBuildTar and jibDockerBuild are all standalone tasks doing both the building and publishing. Expectation would rather be that those three tasks share a common tasks that takes care of the building while the three tasks then do the "publishing" to the desire target.

Expected behavior:

A shared build task should take care of:

downloading base layer
preparting the layers in the cache directory
computing the image id and digest

Further the shared build task should have regular Gradle up-to-date checking to improve build performance, limiting the work of "jib" to a minimal in many cases.

*Motivation

Motiviation from the change comes from two sides.

we would like to be able to follow Gradle conventions. Most notable have proper up-to-date checking.
we work with image id and digests to tag our images (and reuse in some further places). This in turn forces us to do both a jibBuildTar followed by jib to first get hold onto the imageId and digest.

Complicated it the second one that we seem to have an issue with reproducablity so that jibBuildTar and jib do not create the same images in the build. Mostly tracked that down, but not fully yet. Having the structure as above one the one side would make it easier to track such issues down (with Gradle up-to-date checking) and on the other side would avoid such redudant code execution in the first place.

loosebazooka commented 5 years ago

So there are some issues here:

jib doesn't usually build the full image locally, it builds what it needs and send its up to the registry. Doing jibBuildTar first wouldn't be as fast/efficient.
jibBuildTar has an issue that is being fixed for UP-TO-DATE in #1754
Since the targets of jibDockerBuild and jib are remote, we can't actually mark them as @Output without doing some expensive operations to determine those outputs. We also can't guess from previous builds if the remote registry/docker-daemon actually has the image (as they are mutable - the user can delete their images).
I think we actually output the imageId (or digest) to the filesystem during build. But, you shouldn't need to tag by digest, since you can use the alternate reference mechanism gcr.io/my-project/my-image@sha256:ad3d3......
Finally, we share much of our code with our maven plugin in a library called jib-core. Since we delegate a lot of the build to that mechanism, so tailing to gradle isn't exactly our primary concern at the moment. However we are always open to making things better, it's just not clear to me yet how to do this.

I am still curious that jibBuildTar and jib do not create the same images in the build, any help tracking down that issue would be great, we'll also check it out on our side.

remmeier commented 5 years ago

we cannot use digests since they are hard-wired to the repository, so different repositoryies have different digest but the same imageId. And one cannot query a registry by imageId, so we add the imageId as tag and make use of it. Not overly awesome, but the underlying registry design seems to enforce it. and we really need to be able to adress an image by something unique across different (internal/corporate) repositories.

jib mixes local building and publishing in the same task. in terms of gradle conventions and features like up-to-date checking it should rather not do that. then one can also add @Output annotations. I assume all three jibTask build up the layer cache and compute the imageId? this sounds like a good thing to add to a common, cacheable task.

the different imageids come from the broken up-to-date checking. was just to make a point that we would like to avoid duplicate computation where then things may go wrong. in this case we just need the ImageId without pushing or building a Tar.

loosebazooka commented 5 years ago

jib mixes local building and publishing in the same task

Unfortunately this is by design. We consider the jib task a build task with single output on the registry. Caching is an implementation detail.

I'm not saying it's bad idea to split this into two stepsm but we would have to completely redesign our lower level library to do this, and it's not currently in our plans to do that.

However, it sounds like what you really want is the ability to tag using imageId? That is something we might be able to bake directly into the system (I'm not saying we will though, just that we could explore this option)

jib {
  to {
    image = "gcr.io/my-project/my-image:latest"
    tags = ["jib.computed.image_id"]
  }
}

or something?

remmeier commented 5 years ago

oh, ok :-( well we would need the ImageId in advance before publishing. For example we make use of them to bake them into our Helm charts. The Helm charts themselves have a "nice" version number, wheras the images themselves are only addressed by imageId. This to lock the deployment down, give reproducablity and incremental updates of changed parts. jibBuildTar works in that regard, but quite inefficient.

There is an alternative to all this which might also be useful for some other things in jib: digest all the inputs and the plugin configuration. We did use that in the past in other (non-jib) projects. Having the final digest/imageId would have been the prefered flavor, being a bit more robust and really ensures reproudablity. But with that many issues in the area it would be the next best thing. Something that we can implement on our own or maybe something interesting to be provided by jib. A further benefit of the input digest approach would be super fast up-to-date checking of the "jib" task: It can compute the input digest and check the repository for its existence. If so, it does not need to do anything.

we work with larger projects in a mono-repo having 10+ docker images/deployments. So this kind of issues can be quite a pain point in terms of performance. On the other hand once configured together with all what jib provides quite great to work with.

loosebazooka commented 5 years ago

Can you explain why you need them before publishing? Jib putting an image on the registry shouldn't preclude you from modifying the helm charts before pushing them (them = helm), should it?

I guess what I'm curious about is how you're injecting the information into helm right now (I'm very lightly familiar with helm, so the more details the better)? Could you just have some kind of helm transformer task that runs after jib?

remmeier commented 5 years ago

Helm packages are basically are just compressed files with a default yaml-based configuration file and a bunch of template files. Upon installation one can specify further yaml files to overrule the default configuration. The templates then get evaluated based on those values and the result applied to Kubernetes.

So what we do during the build is to package the imageId into the default yaml of the Helm package (using https://github.com/rmee/gradle-plugins/tree/master/helm):

tag: 20464f3033fd3f2321b27a7159af17179fe2446b0ca7529f1fc54c84701cf62c

And make use of it in the template:

 initContainers:
  - name: {{ .Release.Name }}-init-database
     image: '{{ .Values.image.registry }}/management:{{ .Values.tag }}'
     args: ["prepareDatabase"]

So users of the Helm chart to not have to deal with versions of images anymore, it is directly baked into the chart. Installation is as simple as a "helm install myChart...". A (bigger) application can then be made of dozens of Helm charts using countless images. Using digests and ImageIds for the versions allow for reproducablity, no changes in the images lead to exactly the same Helm chart, which in turn allows to skip deployment: vital for larger systems and in this case happening automatically.

yes we could start mixing publishing/building up in more places, so build&publish image => package helm => deploy helm. But it is really not the Gradle way of doing things. We also like to completely build and verify our project before start publishing anything. So after the build there are further verification steps. And we have further deployment-related images that include the Helm charts to make matters worse.

Transforming the Helm charts is doable in principal, but also goes against Gradle in the up-to-date checking area (outputs of tasks must be distinct). So things would get a bit ugly by having to write them to a new place.

So currently the input digest-based approach seems like an approach to follow up from my perspective. I guess many people will work with larger multi-project setups and deploying many services. Getting things properly working can be challenging. So support in this area (and improvements to the layering for multi-project setups with shared base images/dependencies) would be quite helpful to see.

remmeier commented 5 years ago

remmeier commented 4 years ago

the "help wanted" label has been added. I can contribute one or the other fix. Is there an idea how it could be best addressed? Some of the possiblities:

task to build the image / "warm the cache" shared by jib, jibBuildTar, jibDockerBuild (if feasible)
seperate task to push tags
allow computed tags like tags = ["jib.computed.image_id"]
property to add imageId as tag automatically (maybe have built-in default strategies like by version, by image id, by timestamp,...).

currently we make use of both jibBuildTar and jib, but it adds 45 seconds to every (larger) build, forever for the impatient :-) Alternatively we would switch to computing a digest on the sources at the cost of being less robust (due to the possiblity of forgetting about something).

loosebazooka commented 4 years ago

@remmeier I think there are some deeper issues to explore if we want to do what you are suggesting, and I hesistate to recommend any option without some larger reviews of the design of jib.

One option though is to use another tool from google: crane (https://github.com/google/go-containerregistry#crane). This lets you manipulate registry images directly.

do a jib task build
retag the image using crane (wrapped in a gradle task?) [crane tag docs]
- crane tag <my full image ref:build-tag> <imageid-tag>

jerrylawson360 commented 4 years ago

We resolved this type of chicken-and-egg helm chart/docker image by explicitly setting the docker image in the Chart.yaml when creating the final helm tgz file. The helm chart templates pick up the docker image via the templates/deployment.yaml.

For example, we would have a Chart.yaml file in our git repo that looks like:

apiVersion: v1
appVersion: ${appVersion}
description: ${description}
name: ${name}
version: ${chartVersion}

We have a gradle step that does "filtering", ie., variable replacement to generate the "real" Chart.yaml, prior to building up the tgz helm package. "appVersion", "description", "name" and "chartVersion" are gradle project properties used in the filtering to generate the final Chart.yaml. (Alternatively, you could use the --app-version option when running helm package in your build process)

The project.appVersion is used to define the jib.to.image, eg,

jib {
    to {
         image = "${myregistry}/${imagename}:${appVersion}"
    }
}

The templates/deployment.yaml that is included in the final tgz package declares the imagePullPolicy and image as such:

                  imagePullPolicy: {{ .Values.pullPolicy }}
                  image: "{{ .Values.repository }}:{{ .Chart.AppVersion }}"

helm install respects the ".Chart.AppVersion" as the "appVersion" string found in the Chart.yaml of the packaged tgz.

This strategy allows us to publish a Docker image using jib, and explicitly call out the correct Docker image when running helm install, both of which use the same project.appVersion gradle property.

There were no changes or special customizations in jib to make this work and I would consider it a 100% generalized and portable solution.

Note: we also use org.unbroken-dome.helm gradle plugin to package the helm tgz file, which is how we're able to use project.appVersion in the gradle context when building the helm tgz package.

woj-tek commented 1 year ago

I just ran into, I think, similar problem. I started building one project with plain mvn clean install and it built OK (uses buildTar goal). I tried adding multi-platform and one solution was to use build goal. However, it forces pushing.

From local development perspective, running maven install should at the best build the image and install it to local "repository" (docker daemon?) and doing maven deploy should do the former and also push to remote repository. Having build pushing image to remote is quite counterintuitive and usually fails if we try to build public project without access to (their) repository.

(this is somewhat related to https://github.com/GoogleContainerTools/jib/issues/2743#issuecomment-1474392231)

SgtSilvio commented 8 months ago

Might be a little off-topic: The Gradle OCI Plugin (https://github.com/SgtSilvio/gradle-oci) actually splits building from publishing/using an image. Might be interesting for those looking at a different approach.