docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.49k stars 471 forks source link

Proposal: Generate metadata from Git ref and CIs events #728

Open crazy-max opened 3 years ago

crazy-max commented 3 years ago

This proposal aims to make tags and labels generation transparent the same way as metadata-action but handled directly in the CLI for build/bake commands through new flags:

Another possibility would be to extend --tag flag to handle that or only be able to use it through bake to avoid many flags.

We can create a separate library (or not) that aims to extract Git metadata object and CI providers metadata such as GitHub Actions, Travis CI, CircleCI, Jenkins, and so on. Git metadata would be used as fallback if no CI provider is found. Flags values would then be passed to this library.

--meta-tag

To generate metadata according to user needs, several type could be used as key-value pair attributes:

--meta-tag=

If --meta-tag is defined but empty, the following entries will used:

--meta-tag=type=schedule

Will be used on schedule event for some CI providers that can handle it like TravisCI and GitHub Actions.

pattern is a specially crafted attribute that would support Go template with the following expressions:

Pattern Output
nightly nightly
{{ date 'YYYYMMDD' }} 20210813

--meta-tag=type=semver

Will be used on a push tag event and requires a valid semver Git tag, but you can also use a custom value through value attribute.

pattern attribute supports Go template with the following expressions:

Git tag Pattern Output
v1.2.3 {{ .Raw }} v1.2.3
v1.2.3 {{ .Version }} 1.2.3
v1.2.3 {{ .Major}}.{{ .Minor }} 1.2
v1.2.3 v{{ .Major }} v1
v1.2.3 {{ .Minor }} 2
v1.2.3 {{ .Patch }} 3
v2.0.8-beta.67 {{ .Raw }} 2.0.8-beta.67*
v2.0.8-beta.67 {{ .Version }} 2.0.8-beta.67
v2.0.8-beta.67 {{ .Major }}.{{ .Minor }} 2.0.8-beta.67*

*Pre-release (rc, beta, alpha) will only extend {{ .Version }} as tag because they are updated frequently, and contain many breaking changes that are (by the author's design) not yet fit for public consumption.

--meta-tag=type=ref

This type handles Git ref for the following events:

Event Ref Output
pull_request refs/pull/2/merge pr-2
push refs/heads/master master
push refs/heads/my/branch my-branch
push tag refs/tags/v1.2.3 v1.2.3
push tag refs/tags/v2.0.8-beta.67 v2.0.8-beta.67

--meta-tag=type=sha

Output Git short commit (or long if specified) as Docker tag like sha-ad132f5.

--meta-flavor

--meta-flavor defines a global behavior for --meta-tag:

latest tag will be generated by default (auto mode) for:

--meta-oci

Generate OCI Image Format Specification labels:

{
    "org.opencontainers.image.title": "Hello-World",
    "org.opencontainers.image.description": "This your first repo!",
    "org.opencontainers.image.url": "https://github.com/octocat/Hello-World",
    "org.opencontainers.image.source": "https://github.com/octocat/Hello-World",
    "org.opencontainers.image.version": "1.2.3",
    "org.opencontainers.image.created": "2020-01-10T00:30:00.000Z",
    "org.opencontainers.image.revision": "90dd6032fac8bda1b6c4436a2e65de27961ed071",
    "org.opencontainers.image.licenses": "MIT"
}

If some are not suitable, user can overwrite them with the --label flag.

cc @tonistiigi @thaJeztah @chris-crone

tonistiigi commented 3 years ago

Some general questions to figure out before we get too deep on the flags:

If we are going with the hardcoded flags I think we need to compress it somehow to a single flag.

type=schedule

I assume this works by looking for hardcoded ENV in the environment. Does it work with inline script block in actions?

type=ref,event=branch

Is this all evaluated by us from the source, or is some of it expected to come from CI env vars. Is all of this expected to work also when building from a local source(no git url) that contains .git directory?

meta-oci

Where do title/description come from?

crazy-max commented 3 years ago

@tonistiigi

  • How is this implemented? Where is the logic that understands how to read these values out of git repository? Having access to actual git seems like a requirement as these requests are quite complex. Is it hardcoded in buildx, buildkit, or running in a container that contains git(meaning limitations for offline usage if it is loaded automatically).

In my mind values would be requested by the client (buildx) to generate the appropriate tags and labels and invoke the build later.

  • Maybe bake is enough for this and flags are not needed. How does it look like in bake? Something like metatags(tag, config) ? Again, is it hardcoded, passed to buildkit or does bake first make a separate request to figure out the tags/labels and then invoke the actual build later.

If we are going with the hardcoded flags I think we need to compress it somehow to a single flag.

Yes I would also like a single-compressed flag for that purpose. --tag could be extended and also used in bake to keep feature parity with the current build command.

target "build" {
  tags = [
    "type=schedule",
    "type=ref,event=tag"
  ]
}

type=schedule

I assume this works by looking for hardcoded ENV in the environment. Does it work with inline script block in actions?

Yes majority of requested values come from environment variables:

GITHUB_EVENT_NAME can tell if the workflow is scheduled and in this case generate a "nightly" tag but some CIs don't have this env var available.

type=ref,event=branch

Is this all evaluated by us from the source, or is some of it expected to come from CI env vars. Is all of this expected to work also when building from a local source(no git url) that contains .git directory?

We could evaluate from Git as fallback if CI env vars are not suitable/available. About local source that's a good question. I think we could read the working tree and create a "snapshot" tag like {{ .Tag }}-SNAPSHOT-{{ .ShaShort }}.

meta-oci

Where do title/description come from?

title and description come from the CI event payload. With GitHub Actions it's retrieved from GITHUB_EVENT_PATH which is the path of the file with the complete webhook event payload like /github/workflow/event.json but might not be available for all CIs unfortunately.


I will improve this proposal with concrete use cases in the next few days.

errordeveloper commented 2 years ago

I think the functionality that this proposal sets out is very useful overall, but I do have a few thoughts about the UX. I do also side with @tonistiigi's concern regarding access to git.

I have implemented a veru opinionated solution in imagine, and it currently make a very specific assumption that builds are done from git, which isn't something that buildx or buildkit can currently assume without fragmenting the UX. What imagine does is also not configurable, by design it implements a single tagging model.

From the UX perspective, I am don't think adding new mini-language flags would be unhelpful. I find quite a few of the buildx flags (like bake --set and build --output or build --secret) are already way too complicated and need some rethinking, flags proposed here are much more complicated and will add significant cognitive overhead.

To me a more appealing option would be for bake to provided a syntax that supports custom modules, where one would simply load an existing tag generator module or create their own one. I am not entirely sure if this is something that should be done in HCL, I think this is another reason for bake to offer a general purpose language.

On the other hand, as author of imagine, I'd advocate for leaving tasks like this for ecosystem to implement. Perhaps keeping the scope of bake more constrained is better, and higher-level wrappers are something that we should encourage. E.g. a wrapper in some scripting language that generates bake files and allows custom logic would be quite interesting, or something like imagine, which aims for a very particular model.

felipecrs commented 2 years ago

I am not sure if I agree. Obviously, one could have different opinions than others, but having good defaults is always good, and better yet if it is extensible.

Both good defaults plus extensibility are covered by https://github.com/docker/metadata-action, and this proposal is only to bring it to the overall public (i.e. not only people using GitHub Actions).

Docker Buildkit has already some sort of git stuff as well, like docker build https://github.com/docker/buildx.git. Of course, it's different than relying on the fact that it should be called from the git repository for some kind of tags and labels to be identified, but no matter what, these labels and tags generation are not meant to be enabled by default, and if the user explicitly enables, it must be aware of the requirements.

errordeveloper commented 2 years ago

@felipecrs my point is regarding where and how, I do agree that functionality needs to be served, but I don't think proposed flags make it very user-friendly. In my view a structured config syntax or a scripting language would be much better.

felipecrs commented 2 years ago

If all these are presented as CLI flags only, I agree 100% with you. But I'm assuming all of them will have equivalent keys in the bake configuration file. If so, HCL will provide us some sort of scripting, although very limited.

That's exactly why I developed docker-meta, which features:

docker buildx bake -f <(docker-meta)

I built it for my company's internal use, so far, and that's why there is only one preset available (gerrit). But I often have things like:

// docker-meta.config.js

const repo = "my-registry/my-repo";

const cicdImage = `${repo}/cicd`;
const devcontainerImage = `${repo}/cicd-devcontainer`;
const ansibleImage = `${repo}/cicd-ansible`;

function getCacheEntries(image) {
  const entries = [];
  if (process.env.GERRIT_CHANGE_NUMBER) {
    entries.push(`${image}:gcr-${process.env.GERRIT_CHANGE_NUMBER}`);
  }

  if (process.env.GERRIT_BRANCH) {
    entries.push(`${image}:${process.env.GERRIT_BRANCH}`);
  }
  return entries;
}

module.exports = {
  preset: "gerrit",
  "tag-version": false,
  latest: true,
  groups: {
    default: {
      targets: ["cicd"],
    },
  },
  targets: {
    cicd: {
      images: [cicdImage],
      target: "cicd",
      "cache-to": ["type=inline"],
      "cache-from": getCacheEntries(cicdImage),
    },
    devcontainer: {
      images: [devcontainerImage],
      target: "devcontainer",
      "cache-to": ["type=inline"],
      "cache-from": getCacheEntries(devcontainerImage),
    },
    ansible: {
      images: [
        ansibleImage,
        "my-old-tag/ansible",
      ],
      target: "ansible",
      secret: [`id=id_rsa,src=${process.env.RSA_CREDENTIALS_FILE}`],
      "cache-to": ["type=inline"],
      "cache-from": getCacheEntries(ansibleImage),
    },
  },
};

Which would be otherwise too complicated (if not impossible to do) in HCL. That's to say that scripting is a very needed feature, just like it is for make with its Makefile. I just don't know how that can be done in a clear way (e.g. HCL may be one, Go-template for a Helm-like approach may be another).

PS: regardless, scripting should be off-topic for this issue. We should probably create another to talk about it.

errordeveloper commented 2 years ago

I guess in my view we need to decide what is the scope of bake, should it be a little less then what it is now and allow higher level tools to define more features (like @felipecrs' docket-meta shown above, but perhaps we could provide some examples and a convention for a few popular languages), or should bake get extended and expose e.g. Python or JavaScript syntax instead of HCL (or even offer some pluggable hybrid). Once that is decided upon, we can discuss how features like tagging are exposed to the user in a way that is simple and powerful, yet leaves enough room for user's to expose their own opinions. There are other aspects that bake doesn't address right now and need some customisable interface, e.g. rebuild logic based on image presence in registry, which in imagine is implemented by assuming tags represent unique version of build inputs (which imagine does make best effort to ensure, with some assumptions about build reproducibility). However, imagine is all about a particular set of opinions that may not work for everyone.

errordeveloper commented 2 years ago

...should it be a little less then what it is now and allow higher level tools to define more features

Just wanted to clarify, in this case I am implying that bake would be reduced to support just plain JSON manifests without any of HCL extensions, perhaps even without inheritance, i.e. it would only provide a very basic way of expressing a set of build instructions for more then one image that get built in one go. The reasoning, of course, would be to make it very simple and push anything beyond trivial into another layer. Current HCL front-end could be then converted into a separate piece, if folks are interested. Of course, as I said, there is another route, where bake seems more and more features, but I don't see how that could be done with HCL (I'm happy to elaborate with plenty of example where Terraform hits the wall due to the nature of HCL).

Of course, this discussion deserves a new thread, but it is inevitably related to features like what @crazy-max has proposed here, it's the type of features that extend the scope significantly. We could simply consider this purely on the basis of that bake's behaviour would become reliant on external systems of record like git or include heuristics based on environment variables and date and time, which are the sort of factors that bake currently stays away from (in my best knowledge).