cloudfoundry-incubator / kubecf

Cloud Foundry on Kubernetes
Apache License 2.0
115 stars 62 forks source link

spike: investigate solutions for concourse pipelines to download images less often #1564

Closed viovanov closed 4 years ago

viovanov commented 4 years ago

Is your feature request related to a problem? Please describe. The new restrictive download policies from dockerhub are causing problems.

Describe the solution you'd like

Describe alternatives you've considered

1563

viccuad commented 4 years ago

Constraints

  1. concourse.suse.dev has internal (on ECP VMs) and external workers.
  2. We should change the image publishing that we build in Concourse away from dockerhub, too. If we have a mirror, we could set registry_mirror for all docker image resources. From https://github.com/concourse/docker-image-resource#source-configuration, it only takes effect for those images that have no repository defined (those coming from dockerhub).
  3. Changes may need to be backported to older versions of the pipelines on other branches. We may get away without it, though, as they don't get commits normally.

Approaches considered so far

  1. General purpose http(s) proxy such as squid-cache.org. We would need to have 2, for internal and external workers.
  2. Set up a container image cache by spawning and configuring a container pull-through registry somewhere, and:
    1. Point the Concourse workers to it as default.
      • How does one configure Garden for that? Garden == runc
      • Should we move out of Garden and into Docker, etc? Last time we tried, it was not fun.
    2. Set registry_mirror in all docker-image resources. From https://github.com/concourse/docker-image-resource#source-configuration, it only takes effect for those images that have no repository defined (those coming from dockerhub).
  3. Use a different repository for the images, be it ghcr.io, local repository, etc. This means changing and reflying all affected pipelines, substituting the repository in all docker-image and registry-image resources. E.g:
    resource_types:
      - name: <resource>
        type: docker-image
        source:
          repository: <repository>/<org>/<image name>
          tag: <tag>

    And also means republishing the images in the specified new registry.

Possible registry mirrors:

Questions

Aren't all the images cached by garden anyway on the worker nodes?

Outcome

So far, I think we should go with either 2ii or 3. Since both of them need to refly pipelines, we might as well go with 3 to not need to deploy our own registry mirror.

mook-as commented 4 years ago

We might be able to do approach (2) by forking the docker-image resource, and making it automatically use a pull-through cache; that way we only need to add a block (redefining the docker-image resource), instead of modifying every resource. Not sure that's worth it (moving off docker hub is still the better solution, but maybe this will help catch the things we miss).

If we can do that on a node level, of course, that's probably better.

jandubois commented 4 years ago

I don't not understand why we are creating a spike and a science project instead of just moving all our images to a different registry, e.g. ghcr.io, and be done with it (#1557). Our CI is currently broken and should be unblocked as soon as possible. Updating the registry location (and adding some credentials) should be a lot less work than setting up caching registries and will help also those that deploy kubecf outside our controlled environments.

All these caching registry schemes are just hoping to avoid the problem. It will always be possible to exceed the limit due to bugs, new stemcell releases and builds of different release branches. 100 images/6 hours is not that much. And remember that each image will expire every 7 days regardless of usage.

viccuad commented 4 years ago

I think Jan is correct with his comment. Closing this card as I consider the spike finished.