aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 320 forks source link

[ECR] [Remote Docker Repositories]: Pull through cache #939

Closed wayne-folkes closed 3 years ago

wayne-folkes commented 4 years ago

Community Note

Tell us about your request I would like to be able to store docker images that are usually hosted on third party registries in ECR.

Which service(s) is this request for? ECR

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Our organization would like to be able to avoid being affected if/when those registries go down by having a copy of certain images cached in ECR. Today if Quay.io or some other public registry goes down we may not be able to scale up a cluster.

Some secondary benefits would be being able to limit which images can be used and also saving on network costs as we would not need every service pulling images from the internet when they can be pulled from ECR via private link.

I image this would work similar to CodeArtifact where you can have service pull libraries from upstream as needed.

Are you currently working around this issue? How are you currently solving this problem? Today we have to pull a list of images from our many k8s clusters and run a Codebuild job to pull those images and push them into ECR.

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

DJMatus23 commented 4 years ago

We use Artifactory to perform this role, and it has many security, availability and performance benefits.

wayne-folkes commented 4 years ago

@DJMatus23 We are using Artifactory today but would like to get out of having to run that ourselves.

stijndehaes commented 4 years ago

With the recent changes to docker hub with rate limiting this has become more important then ever:

https://docs.docker.com/docker-hub/download-rate-limit/

pgarbe commented 4 years ago

Would love to see this feature in ECR, or even better have ECR integrated in CodeArtifactory (which supports already pull-through-cache)

I build an CDK Construct that syncs specific images from DockerHub to ECR (https://github.com/pgarbe/cdk-ecr-sync). Might be useful until this feature is implemented.

omieomye commented 4 years ago

Thanks for raising this issue. We're looking at what this will take to implement. A couple of questions for the community:

  1. How do you see authentication working between an ECR registry and an upstream one? Is authentication even necessary to use?
  2. Is it more important to cache a publicly shared image or your organization's images in ECR?
connorearl commented 4 years ago

For our use case, all our private images are in ECR, so upstream authentication wouldn't be needed, The main reason we want this is to allow us to restrict our cluster to pull from one source, and only have approved images in ECR.

Also with the new limits for the docker hub and interruptions to quay.io a few months ago, being able to cache public images is important for availability.

wayne-folkes commented 4 years ago

Our use case is similar. Our private images are in Artifactory with plans to move them to ECR. The images we are thinking about are public images that do not require auth to pull.

Being able to cache public images is more important.

dresnick-sf commented 4 years ago

Same for us too. We'd like this feature to ensure we can get public images. No need for authentication with an upstream registry.

RichiCoder1 commented 4 years ago

Same as above. Caching Docker/Quay/GitHub. About the only auth'd images we might cache are from other ECRs.

jkroepke commented 4 years ago

Please use reactions instead wrote same here like the entry post describe it.

Got tons of mails.

blockjon commented 4 years ago

According to the DockerHub announcement I think most of us are now in big trouble because there's only 1 month left before we go from unlimited DockerHub pulls to just 16 pulls per hour which is slated to start on November 1, 2020. That is a big problem, mostly for our automation mechanisms (CICD) which make the assumptions that DockerHub image pulls work. I am encouraged an AWS ECR employee said they would look into solutions, but I think for all intents and purposes that everybody who is in this situation is now scrambling for solutions and we should not wait and hope AWS solves this anytime soon. Even if they roll out a full solution in the next few days we would still need time to adapt to using it. I am now going to look into how to solve this with Artifactory.

jkroepke commented 4 years ago

I don't know if the ECR will helps here. I guess the outgoing IPs of the ECR will be always on limit.

I do some research for my company. I start to investigate how to setup a caching registry for our CI and our Kubernetes Platform (not EKS).

I decide to buy a 5$/month user which has no rate-limit on DockerHub. I don't now if it's cheaper as a caching ECR, but it may cheaper and easier then setup an high availability caching docker registry. Since the caching docker registry will affected by rate-limits, too. (no matter using a own IP or a shared IP by ECR), a caching registry may not help.

eugenestarchenko commented 4 years ago

Would love to see this feature in ECR, or even better have ECR integrated in CodeArtifactory (which supports already pull-through-cache)

I build an CDK Construct that syncs specific images from DockerHub to ECR (pgarbe/cdk-ecr-sync). Might be useful until this feature is implemented.

Google wants to transform its Container Registry to https://cloud.google.com/artifact-registry Note: Artifact Registry is currently in beta as the evolution of Container Registry, it supports multiple artifact formats, regional repositories, and more granular access control. After it becomes generally available, Artifact Registry will replace Container Registry.

It feels like something similar should occur with AWS CodeArtifact and AWS ECR in near future. Good time to ask AWS about RoadMap on that with all these multi-accounts and docker hub limits.

sandinmyjoints commented 3 years ago

I don't know if the ECR will helps here. I guess the outgoing IPs of the ECR will be always on limit.

Presumably, AWS could afford a Docker subscription 😄

DJMatus23 commented 3 years ago

This feels like a win-win for AWS, allow people to cache docker layers somewhere, and charge them the S3 costs plus some small pull fee. Then AWS could just effectively de-dupe those layers, so they're not storing a million copies of the base alpine layer or whatever is popular.

Everytime someone pulls through with a new layer, they have to go get it, and that'll mean an arrangement with each of the downstream providers. But once they've got it, they can decide how long it stays around and at what point it is cheaper to download it again vs just storing it.

This gives the protection people will gladly pay for, and the bonus that there should be a noticeable improvement in container spin-up times too.

Ramneekkhurana commented 3 years ago

I think having a pass through will be a super useful feature, we host all our golden source of images on artifactory today and would love to establish a method between ECR and artifactory in such a way that if image is not available in ECR, it should go to artifactory and pull image from there and cache it in ECR. Artifactory should act as remote backend with or without authentication and ECR should act as local cache for EKS clusters

xtermi2 commented 3 years ago

Another alternative solution would be a AWS managed Nexus Service. We use Nexus internal to cache remote docker repositories like dockerhub. Would love to see this in AWS like the managed Prometheus or Grafana.

uname223 commented 3 years ago

Hi @omieomye I think a pass through registry would be beneficial for a few reasons

At the moment, this can be partially accomplished by Amazon ECR Public Gallery, but even amazon fails to publish their images in their own public repositories, see for example corretto, they lag 7 months now to publish the image: https://github.com/corretto/corretto-docker/issues/47.

StoyanIvanovI commented 3 years ago

We use the proxy caching for Dockerhub (and others) functionality in Harbor; however, would like to have a backup option for when Harbor is not available/under maintenance and would greatly simplify our setup so we don't need to publish Dockerfiles that are literally a single FROM line.

hobti01 commented 3 years ago
  1. How do you see authentication working between an ECR registry and an upstream one? Is authentication even necessary to use?

For us, authentication is critical and required.

  1. Is it more important to cache a publicly shared image or your organization's images in ECR?

For us, our organization's private images are the most important. I would imagine configuration of a proxy cache that has optional authentication using username/password, allowing both public and private registries.

speller commented 3 years ago

Please implement a regular pass-through cache as a first stage. This is to eliminate the Dockerhub rate limit.

Then, a caching proxy to cache requests to private/public ECR.

Then, a caching proxy to external private repos.

mbaitelman commented 3 years ago

https://aws.amazon.com/blogs/aws/announcing-pull-through-cache-repositories-for-amazon-elastic-container-registry/

svyatoslavmo commented 3 years ago

Is there any ETA for Dockerhub pull through?

skeggse commented 3 years ago

If I'm reading that announcement right, it's available today. Perhaps I'm mistaken?

luhn commented 3 years ago

Doesn't seem like it.

...with support for upstream repositories hosted on Amazon Elastic Container Registry Public and Quay.io.

Vlaaaaaaad commented 3 years ago

From a Senior Product Manager, on Twitter:

Thanks for sharing the launch! Pull through cache supports ECR Public and http://Quay.io images right now, but we have another announcement coming out later today for Docker Hub images :)

CloudFormation is also coming very soon

It's re:Invent, I'd wait for the end of the week as you can always be surprised and have your plans ruined by yet another announcement 😅

srrengar commented 3 years ago

Hey that Senior Product Manager is me :)

An official announcement is incoming for the Docker Official Images are coming to ECR Public - https://gallery.ecr.aws/docker/. These images can be cached into your ECR Private repositories using pull through cache.

mbaitelman commented 3 years ago

Oof, didn't notice that limitation before sharing.

nhinds commented 3 years ago

The pull through cache docs indicate internet access will be required even with ECR VPC endpoints, at least for the first time an image is pulled:

When an image is pulled using a pull through cache rule for the first time, if you've configured Amazon ECR to use an interface VPC endpoint using AWS PrivateLink then you need to create a public subnet in the same VPC, with a NAT gateway, and then route all outbound traffic to the internet from their private subnet to the NAT gateway in order for the pull to work. Subsequent image pulls don't require this.

The requirement for an unrestricted route to the internet makes this a no-go for places which require private subnets with no (or limited) internet egress. Will this limitation be removed in the future as part of this issue, or is there a separate issue tracking access to these repositories from subnets with no internet access?

archoversight commented 3 years ago

@srrengar will there be support for using that directly in docker configuration like GCP makes available?

See https://cloud.google.com/container-registry/docs/pulling-cached-images

eperdeme commented 3 years ago

The pull through cache docs indicate internet access will be required even with ECR VPC endpoints, at least for the first time an image is pulled:

When an image is pulled using a pull through cache rule for the first time, if you've configured Amazon ECR to use an interface VPC endpoint using AWS PrivateLink then you need to create a public subnet in the same VPC, with a NAT gateway, and then route all outbound traffic to the internet from their private subnet to the NAT gateway in order for the pull to work. Subsequent image pulls don't require this.

The requirement for an unrestricted route to the internet makes this a no-go for places which require private subnets with no (or limited) internet egress. Will this limitation be removed in the future as part of this issue, or is there a separate issue tracking access to these repositories from subnets with no internet access?

I wonder what does the pull then. Does it ask the instance that imitated the pull to do some sort of redirect? Wonder if it'll work through corporate proxy. Be good to have a flow diagram of the traffic.

mmerickel commented 3 years ago

I wonder what does the pull then. Does it ask the instance that imitated the pull to do some sort of redirect? Wonder if it'll work through corporate proxy. Be good to have a flow diagram of the traffic.

The goal of the feature is to either:

1) update all your dockerfiles etc to point at the pull-through-cache and use that instead of docker hub 2) configure your docker daemon.json to use the pull-through-cache as a registry mirror via something like:

{
  "registry-mirrors": ["https://public.ecr.aws/docker/"]
}

(note this doesn't work right now, not sure what the correct URL to use is here, but something like this is the goal)

The next question of course will be how to configure EKS to use the mirror. I'm unaware of a pattern with a managed node group to configure a mirror on the docker daemon. Does anyone have a solution to that?

nhinds commented 3 years ago

I wonder what does the pull then

In my testing with an EC2 instance, instead of requesting the layers from the prod-region-starport-layer-bucket S3 bucket my Docker daemon tries to download a layer from https://d2glxqk2uabbnd.cloudfront.net for an ECR Public repository, and https://quayio-production-s3.s3.amazonaws.com for a Quay repository. These seem to be the raw upstream layer locations, not something controlled by the ECR private repository, so it looks like if a tag isn't already cached the ECR repository just tells the client to go fetch the layers from the upstream by itself. This should work with egress proxies as long as the Docker daemon is configured to go through that proxy as would be required for pulling directly from quay.io or ECR Public - though of course if you're willing to do this, there's not much of a reason to configure a private pull-through cache.

Interestingly, even if the initial layer fetch fails (due to e.g. no internet egress, a proxy rule preventing access, etc.), the new image tag still eventually gets populated into the ECR repository cache. This means that even if the first pull fails, the second pull from the ECR repository will download the layers from ECR directly as long as you wait long enough for ECR to pull the image.

A viable short-term workaround might be to accept that the first image pull is going to fail, and just pull twice. Container orchestration systems like ECS and k8s already retry failed image pulls, so a lot of people might get this behavior for "free". I'd still like to see an option that doesn't require clients to have internet egress for the first pull though.

gauthamsam commented 3 years ago

Yes, your observations about the first-time pull behavior are correct. When an image is pulled for the first time, the call to fetch the layers will be routed to the corresponding external upstream registry’s layer location (the download URL). Any subsequent pulls that happen after the image is cached locally in your private registry will not result in any egress calls from the container instance that is doing the pulls, since the synchronization against the upstream registries are going to be done by ECR in the background.

If you don’t have an egress connection to the internet, the first-time pulls will timeout, but the images will still be copied to your private registry in the background. So, any pulls that you do after the images are copied will still work, with your current VPC settings. You may also do that first-time pull from a client that has internet access such as your CI/CD that already downloads public images, so that the images are always there by the time your container orchestrator needs to pull them.

We do have plans to optimize this experience further in the future by having an ECR endpoint stream the layers directly from the upstream registries. Please feel free to open an issue for it too.

JacobWeyer commented 3 years ago

I wonder what does the pull then. Does it ask the instance that imitated the pull to do some sort of redirect? Wonder if it'll work through corporate proxy. Be good to have a flow diagram of the traffic.

The goal of the feature is to either:

  1. update all your dockerfiles etc to point at the pull-through-cache and use that instead of docker hub
  2. configure your docker daemon.json to use the pull-through-cache as a registry mirror via something like:
{
  "registry-mirrors": ["https://public.ecr.aws/docker/"]
}

The next question of course will be how to configure EKS to use the mirror. I'm unaware of a pattern with a managed node group to configure a mirror on the docker daemon. Does anyone have a solution to that?

Can we do this by adding it to the userdata in the node launch template and updating the node host?

archoversight commented 3 years ago

I wonder what does the pull then. Does it ask the instance that imitated the pull to do some sort of redirect? Wonder if it'll work through corporate proxy. Be good to have a flow diagram of the traffic.

The goal of the feature is to either:

  1. update all your dockerfiles etc to point at the pull-through-cache and use that instead of docker hub
  2. configure your docker daemon.json to use the pull-through-cache as a registry mirror via something like:
{
  "registry-mirrors": ["https://public.ecr.aws/docker/"]
}

The next question of course will be how to configure EKS to use the mirror. I'm unaware of a pattern with a managed node group to configure a mirror on the docker daemon. Does anyone have a solution to that?

Can we do this by adding it to the userdata in the node launch template and updating the node host?

Do note that registry-mirrors may not contain extra paths, so this example snippet does not work.

jhovell commented 3 years ago

Am I missing something or are "unofficial" dockerhub images still not supported? The announcement says Docker Official images are being mirrored in ECR public but it seems like not the other ones...

m1keil commented 3 years ago

From twitter:

Other images are under consideration for the roadmap next year, along with authenticated private registries.

So if you were hoping for this to save you from paying Docker, sadly no(t yet)

JacobWeyer commented 3 years ago

I'm fine with paying docker even, I just want my organizations used images copied to ECR for security, reliability and redeployment purposes. The ability to not be able to use dockerhub as a mirror for pull through cache is pretty disappointing, especially because the images are even at a different path now within ECR's public gallery. Double down that AWS as an organization is terrible about putting their own images into their own public ECR.

FROM docker/library/python:latest vs FROM python:latest feels like it kind of defeats the point.

srrengar commented 3 years ago

Hey folks,

Yes at this time pull through cache can reliably and anonymously access any public image on ECR Public and Quay.io on a customer's behalf. This includes Docker Official Images which are now also hosted on ECR Public. This will let you cache the most popular public images or also AWS images into your own ECR registry for security, reliability, and redeployment purposes as you mentioned @JacobWeyer. Most AWS images are hosted on ECR Public, though there may be some exceptions which will eventually be there. Feel free to DM me on Twitter (@SravanR) any specific examples, and I can also explain more how your performance will be different by changing your Dockerfile to pull from ECR or ECR Public instead.

I've also created a new issue here for authenticated registries. ECR pull through cache would use these credentials to pull images from private registries or registries that require authentication to access higher pull limits. https://github.com/aws/containers-roadmap/issues/1584

More information on our design and plans can be found here too - https://github.com/aws/containers-roadmap/issues/1581#issuecomment-983856594

Hope this helps!

woodhull commented 3 years ago

Yeah, this is frustrating in the way it's only halfway there! Just realized that we still need Docker Hub for some circleci published images that are optimized for CI environments.

https://hub.docker.com/u/circleci

Not trying to avoid the Docker Hub subscription, but interested in staying 100% on the AWS network to maximize performance.

dserodio commented 2 years ago

Yeah, this is frustrating in the way it's only halfway there! Just realized that we still need Docker Hub for some circleci published images that are optimized for CI environments.

hub.docker.com/u/circleci

Not trying to avoid the Docker Hub subscription, but interested in staying 100% on the AWS network to maximize performance.

@woodhull slightly off-topic, but you're using legacy images. CircleCI recommends migrating to cimg/ images.

thpham commented 2 years ago

Hello @srrengar,

this feature has just been released and announced here.

But it only allow support for public ECR or Quay.io upstream image repositories...

Is there any plan to for enhancements with ANY private repositories like docker hub, redhat... that requires authentication for pulling image cache ?

Legion2 commented 2 years ago

@thpham see #1584

dserodio commented 2 years ago

What about support for "vendor official" (as opposed to "Docker official) images like grafana/grafana or prom/prometheus? Is there a GitHub issue for this already?

coultn commented 2 years ago

These two are on ECR Public already (e.g. https://gallery.ecr.aws/bitnami/prometheus, https://gallery.ecr.aws/bitnami/grafana) published through verified ECR Public third parties. Please keep telling us which images you need and we will work with the community and our partners to get them on ECR Public if they aren't there already.

dserodio commented 2 years ago

Thanks! While I have nothing against Bitnami in particular, I'd rather use "official" (images maintained by the same maintainers as the open source project) if possible.

joebowbeer commented 2 years ago

Adding to what @dserodio wrote, the bitnami images are not the same, and are sometimes different in unexpected ways. Using prometheus as an example, the most recent bitnami in the public gallery is 2.33.1 with reported size of 111 MB, while most recent prom/prometheus on docker hub is 2.33.3 with size of ~70 MB.

By the way, it's great to see the library/node images (incl. alpine) in the public gallery.

johanneswuerbach commented 2 years ago

Is there any documentation on whether pull through caching is usable cross-account? We have a centrally managed ECR registry and would like other accounts to pull new tags.

By clicking through the UI I found this IAM action ecr:BatchImportUpstreamImage, but granting it doesn't work and the action is also not documented https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelasticcontainerregistry.html. Is there a way to allow other accounts to pull new image tags or can this only be done by a role in the ECR owning account?

srrengar commented 2 years ago

@johanneswuerbach other accounts can pull new images the same way they do today as long as they have ecr:BatchImportUpstreamImage and ecr:CreateRepository (if caching a new image altogether). We are working on updating the documentation, but can you reach out to support if that's not working in the meantime?