giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Retagger improvements #3408

Open piontec opened 4 months ago

piontec commented 4 months ago

As part of our registries efforts, we wanted to improve retagger as well. This ticket catches more details about that task.

How retagger currently works

Retagger is a go CLI and a huge circle CI build pipeline. In general, there are 2 modes, executed in the CI pipeline one after the other:

  1. Regular images - images copied from a source registry to our registries. They are listed in skopeo-*.yaml files. They are synced in the following way:
    • retagger filter is invoked for each skopeo-*.yaml file. Internally, this retagger command works as follows:
      • Invokes skopeo sync with the single skopeo-*.yaml file in dry mode. This lists all the tags available for all the images in the source repository (so it "expands" semver ranges and tags with regular expressions into specific tags).
      • For each image, queries hard coded upstream registries (quay, old azure and aliyun) for existing tags.
      • Computes a set of missing tags for each image and saves it into a file called skopeo-*.yaml.filtered
    • retagger sync is run for that *.filtered file to do actual synchronization.
    • This workflow with first split then sync above comes, AFAIR, from two sources, the main one being that some time ago it was impossible to run sync on semver ranges, and we had to discover and list the images before we were able to use skopeo for sync. There was something about better performance as well, but I'm not sure if that argument holds after semver ranges are now supported in scopeo.
  2. Customized images - images mutated from the upstream source and then uploaded to target registries. The idea is that if someone wants an upstream image, but with some minor change (like adding USER <UID>), retagger can do this as well: it will build the image, then upload it to a target registry. It is important to note, that majority of "custom images" actually only include image rename. The main reason for that seems to be to avoid the confusion about what this image does (if we replicate bitnami/postgresql into giantswarm/postgres, we no longer know which build of postgres is that, so to avoid that we rename the image into giantswarm/bitnami-postgres; this is no longer need - see "current problems" below). The rest of customized images actually mutates the source image. The build works like this, in a loop for each image in the customized-images.yaml
    • retagger creates a Dockerfile, that has FROM <src image> set using the customized-images.yaml entry, then runs docker build + tag and docker push on it.

Current problems

  1. retagger synchronizes all the images into the target repository giantswarm/*, so the original repository name is lost, for ex. postresql/postgresql image from dockerhub becomes giantswarm/postgresql. This was necessary when we were replicating images from docker hub to our own docker hub account, as all of the docker hub is available as a single domain. Unfortunately, this creates a problem, as by looking at images in a registry, it is impossible to tell if a specific image is really custom and build by Giant Swarm, or a verbatim copy from upstream.
  2. retagger is building images. This is a problem on a complexity level (a lot has to be setup and working for retagger to do that), and also on a logical level (retagger is supposed to synchronize images, not build them). It is also super important to note, that right now this "customized images" support is the only real difference between retagger and just plain upstream skopeo. If we stop doing this, we don't need retagger at all. Also, this build process makes it harder to prepare images for security extensions we have in mind (singing and clear ownership).
  3. There's effectively no ownership of images, as files are organized by the source registry and this structure is hard coded. We need to be able to change the file organization to per-team, so we know who needs and know about why we need specific images.
  4. It's not configurable. Upstream registries are just hard-coded.

Desired state

  1. We want to decommission retagger and replace it with vanilla upstream skopeo.
  2. The circle CI configuration is huge, complex and costs quite a lot of money. We want to be able to run the synchronization tool on our own cluster.
  3. If something's an image build, we want to extract it into a separate repository, where it runs a normal image build procedure (has to use our architect-orb for consistent build process). New retagger will really only synchronize images.
  4. We want to have data about which team needs and owns a retagged image.
  5. We want to be able to easily configure target registries to synchronize to.
  6. We want to be able to tell if an image is copied from upstream or really a Giant Swarm specific image, so we want to use proper repositories and their names (so postresql/postgresql image from docker hub synchronized to gsoci.azurecr.io becomes gsoci.azurecr.io/postgresql/postgresql and not gsoci.azurecr.io/giantswarm/postgresql.

Migration plan

Tasks

### Tasks
- [ ] https://github.com/giantswarm/roadmap/issues/3425
- [ ] https://github.com/giantswarm/roadmap/issues/3059
- [ ] https://github.com/giantswarm/roadmap/issues/3426
- [ ] https://github.com/giantswarm/roadmap/issues/3427
- [ ] Check if there are some docker hub mirrors, that we can use as a source for docker hub images without the risk of crossing the pull limit quota of the not-super-enterprise docker account.
- [ ] https://github.com/giantswarm/roadmap/issues/3424
- [ ] https://github.com/giantswarm/giantswarm/issues/30501
uvegla commented 1 month ago

The git tag prefix / mono repo support was released as of:

The POC can be found at: https://github.com/giantswarm/laszlo-monorepo

This only solves custom image builds. There are a bunch of images in the customised images yaml file that are repo renames only. Reason they are there because everything else uses skopeo sync that does not support renames (see: https://github.com/containers/skopeo/issues/1998). The custom image build uses skope copy that does support rename. We have to figure out something for those.

stone-z commented 1 month ago
  1. We want to be able to tell if an image is copied from upstream or really a Giant Swarm specific image, so we want to use proper repositories and their names (so postresql/postgresql image from docker hub synchronized to gsoci.azurecr.io becomes gsoci.azurecr.io/postgresql/postgresql and not gsoci.azurecr.io/giantswarm/postgresql.

This will require some wider communication, since there are customer image policies that rely on using the giantswarm organization.

It is also important to check whether this assumption can be safely held for other registries (China, zot, etc.) and whether that will have consequences (e.g. could zot contain a mix of retagged and direct upstream images?)