[EKS] [request]: managed node groups need support for defining registry mirrors

mmerickel commented 3 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request What do you want us to build?

Add builtin support or documentation for defining registry mirrors for managed node groups.

Which service(s) is this request for? This could be Fargate, ECS, EKS, ECR

EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

The default for most public helm charts is to source images from docker hub (docker.io), and for experimentation we'd like to avoid putting too much process on it. However, syncing directly from docker hub without using authentication triggers rate limits very quickly.

Are you currently working around this issue? How are you currently solving this problem?

The rate limiting issues with docker hub are a huge problem for organizations and the current standard approaches are to:

Re-host all images on a self-managed ECR registry. Lots of maintenance / process is required to make this work.
Use Kyverno or other solutions to sync image pull secrets to every namespace and possibly also rewrite image specs to point at a mirror. This requires exposing secrets for the mirror and forcing things into user-space where they shouldn't really need to deal with the problem, especially in experimental phases of development.
Wait for something like https://kubernetes.io/docs/tasks/kubelet-credential-provider/kubelet-credential-provider/ to standardize to support authenticating with docker hub itself at the node level instead of mirroring/caching images. This still has limitations and would be preferable to specify a mirror to avoid rate limits altogether.

Additional context Anything else we should know?

We'd really like to have integrated support between the coming ECR pull-through caching support https://github.com/aws/containers-roadmap/issues/939 and managed node groups to automatically authenticate with one that the cluster can use.

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

mmerickel commented 2 years ago

FWIW the recommendation from AWS support at this time is to define a launch template on the managed node group and write /etc/docker/daemon.json yourself with the config you need. Of course EKS also writes its own content to that file so you'll have to take that into account as well.

zatricky commented 1 year ago

Note that the newer EKS clusters make use of containerd, thus you'd need to make the changes elsewhere: a) Add appropriate content to /etc/containerd/certs.d/docker.io/hosts.toml referring registry-1.docker.io to the registry mirror b) if there is auth, add appropriate imagepullsecrets references to every namespace (or to every pod individually)

See https://github.com/containerd/containerd/blob/main/docs/hosts.md

benjimin commented 6 months ago

This also needs to be configurable for EKS Fargate.

There's various common reasons for wanting a kubernetes cluster to use a cache or mirror for container images, particularly for performance (launch pods on new hosts faster while scaling), security (embargo upstream changes for vetting), and accomodating upstream usage limits. It is advantageous if this can be configured transparently (e.g., via containerd settings) instead of needing to individually customise every pod-spec in the entire infra-code-base to explicitly refer to the local registry mirror (for example, so that the image cache solution could re-engineered independently from application deployments, and applications can be deployed with less customisation into multiple environments such as clusters in different accounts/regions with different private ECR pull-through-cache addresses). It's currently more difficult to achieve such transparency in clusters mixing both EC2 nodes and Fargate. (A mutating admission controller is another option but requires far more complexity to set up. Kustomisation may be an alternative in some cases but is less convenient if deploying Helm charts.)

aws / containers-roadmap

[EKS] [request]: managed node groups need support for defining registry mirrors #1475

Community Note