aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[ECR Public] [request]: Add VPC (PrivateLink) support for public repo push and pull #1160

Open omieomye opened 3 years ago

omieomye commented 3 years ago

Community Note

Tell us about your request What do you want us to build? Add VPC endpoints for Amazon ECR Public

Which service(s) is this request for? ECR Public

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem. Push and pull images from ECR Public without leaving the Amazon network.

Are you currently working around this issue? How are you currently solving this problem? Pull from an ECR public repo into an ECR private repo and then use ECR private VPC endpoints. However anonymous pulls cannot be used, and we also have to configure IAM policies for access.

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

psoares commented 3 years ago

This is particularly annoying for building private EKS clusters that use spot instances and don't have internet access. They need https://gallery.ecr.aws/aws-ec2/aws-node-termination-handler.

santhoshratala commented 3 years ago

This is becoming a real dragger to build and run a private EKS cluster. I'm trying to setup EFS CSI storage driver, some images are being referenced to a private ecr repos as in the following link (which I'm able to pull from using vpc endpoints), however there are some images which are being referenced to public.ecr.aws and my cluster being in a private VPC network can't reach out to that.

https://docs.aws.amazon.com/eks/latest/userguide/add-ons-images.html

The most frustrating part is I can't find any documentation from AWS on how to reference these images within a private EKS cluster. Right now I've no option but to pull these images from the public ecr repo and push it to a private repo (which isn't ecr!)

rene84 commented 3 years ago

My usecase is to run an cloudwatch agent side car on an ECS cluster in a private VPC. Indeed also no way to pull directly from the public aws ECR which I would like to do

Gooygeek commented 3 years ago

It's a bit of a pain but a work around (found in the AWS docs) is to build or upload the image to your own private ECR repo. You would miss out on the latest version if you didn't do it regularly and it would count towards the quota/bill. Personally, I pulled down the official latest image to my local machine then pushed it up to a private ECR, change the task definition to use the private image and it works. Here is the section of the docs on building the image yourself: https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon-ecs.html#xray-daemon-ecs-build

Vlaaaaaaad commented 2 years ago

With the lack of a VPC Endpoint for ECR Public, the image pulling also seems to only happen through NAT Gateways. Even though ECR Public is likely using S3 as storage, the S3 VPC Endpoint does not seem to be used when pulling an image. For high-scale environments, this cost adds up.

I would also like to see the VPC Endpoint have stronger IAM restrictions through Endpoint Policies, like only allowing pulls from certain repositories. Today, I can only allow pulling images from any and all ECR Public repos which is an added security risk.

rmak-cpi commented 2 years ago

I have tried the workaround of setting up a pull-through cache from my private ECR (exposed with VPC endpoint) to public ECR repositories and it seems to be working reasonably well.

jbg commented 2 years ago

@rmak-cpi Is that from a private cluster with no internet access? The ECR pull-through cache docs suggest that it should fail the first time you try to pull if the private subnet doesn't have a route to a NAT gateway:

When an image is pulled using a pull through cache rule for the first time, if you've configured Amazon ECR to use an interface VPC endpoint using AWS PrivateLink then you need to create a public subnet in the same VPC, with a NAT gateway, and then route all outbound traffic to the internet from their private subnet to the NAT gateway in order for the pull to work. Subsequent image pulls don't require this.

We're using ecr-mirror as a reasonably tidy workaround.

rmak-cpi commented 2 years ago

@jbg, I guess the doc is pretty clear on that point so it is likely that I have an opening somewhere in the supposedly private network (I do have an outbound firewall). Let me doublecheck but I guess it could a public ecr vs quay.io thing.

LeeS-NW commented 2 years ago

Adding interest in this feature request.

My use case is pretty simple, pull the public amazon linux image (public.ecr.aws/amazonlinux/amazonlinux) into a Fargate ECS service so that Amazon MWAA (Airflow) can submit tasks to it, without going over the NATGW (no outbound internet access in the sg). I've attempted with the ecr.api and ecr.dkr interface endpoints and have the s3 gateway endpoint allowing starport layer bucket access, though no success.

I've successfully got the pull through cache approach working, thanks @rmak-cpi for pointing it out in this issue! I thought I'd propose adding this into the documentation for VPC endpoints to hopefully save somebody else some hassle whilst this is still in the backlog as a feature request.

jlbutler commented 1 year ago

Hi all. We are doing some planning and will discuss this. I also wanted to share that we're considering support for private clusters in PTC, which is maybe a better pattern for pulling public images into a private cluster.

Is there much interest here for image push, or is it mostly for pull?

dhalperi commented 1 year ago

Pull is by far the bigger use case for us.

jlbutler commented 1 year ago

We have a pretty good understanding of PTC support for private networks, and that meets the need for pulls. We have aspects of the other comments in other work we have in flight, so I'll close this one out soon. Thanks everyone!

dc2tom commented 1 year ago

Hi, before this one closes can I please get some clarity about the PTC and image pulls? - we stopped using PTC due to our use of the Renovate bot - it runs the "list-tags" operation on the ECR repo to see if updates are available to offer out.

Running the bot against ECR public directly, it functions as expected but the Private ECR with PTC configured never seems to update the available tags unless someone first pulls a new tag from upstream manually, so the tag then exists in the private repo.

The docs for PTC state "When a cached image is pulled through the Amazon ECR private registry URI, Amazon ECR checks the remote repository up to once per 24 hours to verify whether the cached image is the latest version. This timer is based off the last pull of the cached image." This is fine, but if the repo is using immutable tags, you'll never discover new tags without checking the public repo first yourself.

Happy to raise this as a separate issue / request for PTC

jbg commented 10 months ago

Pull-through cache is not a very good workaround due to the need to have public Internet access for the first pull and the fact that new tags don't get discovered. With ECR Public using CloudFront it's also difficult to set up a manual proxy unless you want to whitelist all of CloudFront.

fractos commented 6 months ago

Ugh. Use-case: Pull from an otherwise locked-down network.