viovanov commented 4 years ago

Is your feature request related to a problem? Please describe. The new restrictive download policies from dockerhub are causing problems.

Describe the solution you'd like

look at using a caching image proxy
make copies of the docker images we use on ghcr.io

Describe alternatives you've considered

1563

viccuad commented 4 years ago

Constraints

concourse.suse.dev has internal (on ECP VMs) and external workers.
We should change the image publishing that we build in Concourse away from dockerhub, too. If we have a mirror, we could set registry_mirror for all docker image resources. From https://github.com/concourse/docker-image-resource#source-configuration, it only takes effect for those images that have no repository defined (those coming from dockerhub).
Changes may need to be backported to older versions of the pipelines on other branches. We may get away without it, though, as they don't get commits normally.

Approaches considered so far

General purpose http(s) proxy such as squid-cache.org. We would need to have 2, for internal and external workers.
Set up a container image cache by spawning and configuring a container pull-through registry somewhere, and:
1. Point the Concourse workers to it as default.
  - How does one configure Garden for that? Garden == runc
  - Should we move out of Garden and into Docker, etc? Last time we tried, it was not fun.
2. Set registry_mirror in all docker-image resources. From https://github.com/concourse/docker-image-resource#source-configuration, it only takes effect for those images that have no repository defined (those coming from dockerhub).
Use a different repository for the images, be it ghcr.io, local repository, etc. This means changing and reflying all affected pipelines, substituting the repository in all docker-image and registry-image resources. E.g:
```
resource_types:
  - name: <resource>
    type: docker-image
    source:
      repository: <repository>/<org>/<image name>
      tag: <tag>
```
And also means republishing the images in the specified new registry.

Possible registry mirrors:

https://docs.docker.com/registry/recipes/mirror. Simple to deploy (example). But since we have workers on ECP, it would need to be accessible outside of the cluster and protected with credentials.
~Harbor Helm chart~ (doesn't support pull-through mirror)
quay.io https://access.redhat.com/documentation/en-us/red_hat_quay/3.3/html-single/deploy_red_hat_quay_-_basic/index#add-repo-mirroring

Questions

Aren't all the images cached by garden anyway on the worker nodes?

Outcome

So far, I think we should go with either 2ii or 3. Since both of them need to refly pipelines, we might as well go with 3 to not need to deploy our own registry mirror.

mook-as commented 4 years ago

We might be able to do approach (2) by forking the docker-image resource, and making it automatically use a pull-through cache; that way we only need to add a block (redefining the docker-image resource), instead of modifying every resource. Not sure that's worth it (moving off docker hub is still the better solution, but maybe this will help catch the things we miss).

If we can do that on a node level, of course, that's probably better.

jandubois commented 4 years ago

I don't not understand why we are creating a spike and a science project instead of just moving all our images to a different registry, e.g. ghcr.io, and be done with it (#1557). Our CI is currently broken and should be unblocked as soon as possible. Updating the registry location (and adding some credentials) should be a lot less work than setting up caching registries and will help also those that deploy kubecf outside our controlled environments.

All these caching registry schemes are just hoping to avoid the problem. It will always be possible to exceed the limit due to bugs, new stemcell releases and builds of different release branches. 100 images/6 hours is not that much. And remember that each image will expire every 7 days regardless of usage.

viccuad commented 4 years ago

I think Jan is correct with his comment. Closing this card as I consider the spike finished.

cloudfoundry-incubator / kubecf

spike: investigate solutions for concourse pipelines to download images less often #1564

1563

Constraints

Approaches considered so far

Questions

Outcome