aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[ECR] [request]: Cross Region Replication for Repositories #140

Closed RyPeck closed 3 years ago

RyPeck commented 5 years ago

Tell us about your request Cross region replication for images and tags in ECR Repositories

Which service(s) is this request for? ECR

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We have containers deployed in multiple regions. We would like to rely on AWS to replicate the containers across regions, just like we do for S3 Objects in an S3 Bucket.

With this feature - we will easily be able to use AWS PrivateLink for ECR in each region we run containers in VPCs without internet access.

In the absence of this feature, we can build our own solution to copy images to multiple regions with each build or pay the costs (per GB and latency) involved in pulling images from a single region to another. This will also remove a single point of failure, ECR in a single region, from our current setup.

Are you currently working around this issue? Currently pulling from a single region.

deleugpn commented 5 years ago

This would be an amazing feature. I have been trying to build a Pipeline with cross-region deploy and ECR has been a stopper for me. It's a bit annoying to get ECR on all desired region and have just one (the pipeline region) pushing to all regions before running a cross region deploy.

ghost commented 5 years ago

@RyPeck is part of a team in my organization that we provide infrastructure/container services for. This would be a very useful feature.

max-rocket-internet commented 5 years ago

Cross region replication for images and tags in ECR Repositories

Yes or what would be even better is just a global ECR option. i.e. ecr.aws/my-org/my-image. And this would be transparently replicated in the same way a CDN is. This is how Google's GCR does it and it's much simpler.

ghost commented 5 years ago

Defiantly want to see thing getting worked on soon.

ECR is a single point of failure in the ECS design. ideally a global endpoint where the images are pulled and pushed from but are cached at region level ECS / EKS / Fargate that are the closed to you with the ability to pull from different regions if there are any timeout issues ect.

we are using azure container registry just now as it has the feature.

jtoberon commented 5 years ago

Thanks for your input, everyone. Before I joined AWS, I worked on a team that had to build cross region replication on top of ECR, too!

I'm going to leave this as Proposed only because it's not on the top of our priority list yet. In the meantime, it would be useful to hear more about what folks want:

  1. When do you want to replicate an image? Is it sufficient to replicate all images, or do you want some control over which images are replicated?
  2. Do you want to replicate across accounts?
  3. What are the primary problems you're trying to solve with cross region replication? For example, you might be trying to speed up image pulls, reduce your data transfer bill, build another region for disaster recovery, etc.
  4. How much control do you really want to have over referring to an image? Today, ECR includes a "region" part in the registry URL: https://aws_account_id.dkr.ecr.region.amazonaws.com. Let's say we remove the region. Do you want DNS control over where this URL points to, or do you want ECR to figure out the best way to serve the request (knowing that this can have cost and performance implications)?
kadrach commented 5 years ago

Primarily speeding up image pulls of sometimes multi-gb sized images in ECS (and AWS Batch), half-way around the world.

(ECR as a pull-through cache would be nice!)

deleugpn commented 5 years ago

You people just made me realise I don't need this to achieve what I want. I have been using a constant to set the image of the Task Definition for so long I didn't notice I could use only one ECR across all regions. Yeah it will make Fargate download slower but for my use case that's not a big deal.

With that in mind, I think the best option for me would be if I could choose to not have region on the ECR address and Amazon would load the nearest available from a static link. Like putting Cloudfront in front of ECR. I wonder if that's possible without making my images public.

pmontanari commented 5 years ago

Hi @jtoberon,

My need for multi Region Replication is mainly for Disaster Recovery. Ideally we would remove the region from the registry URL and let ECR figure out the best way to serve the request.

jlambert121 commented 5 years ago

@jtoberon

rdpa commented 5 years ago

@jtoberon

  1. For my project's use case it would be sufficient to replicate all images.
  2. My project's use case does not require replicating across accounts.
  3. Many of the above reasons are applicable - speed up image pulls, reduce your data transfer bill, etc. My project uses active-active datacenters in multiple regions, but currently only pulls from and pushes to one "main" ecr region.
  4. Having ECR figure out the best way to serve the request would be ideal in active-active architectures, where if one ECR region goes down, ECR can simply figure out where to pull from. DNS control would also be nice in this scenario, but I'm not sure both are possible at the same time.
mbelang commented 5 years ago

@jtoberon

  1. Yes cross region would be a must
  2. Cross account is also a problem that we are facing
RyPeck commented 5 years ago
  1. When do you want to replicate an image? Is it sufficient to replicate all images, or do you want some control over which images are replicated?

When I push. I want to replicate all images.

  1. Do you want to replicate across accounts?

Possibly. Having cross account permissions work for replicated images would be a requirement.

  1. What are the primary problems you're trying to solve with cross region replication? For example, you might be trying to speed up image pulls, reduce your data transfer bill, build another region for disaster recovery, etc.

Cross region image pulls is the primary motivator which will reduce the data transfer bill and speed up image pulls.

  1. How much control do you really want to have over referring to an image? Today, ECR includes a "region" part in the registry URL: https://aws_account_id.dkr.ecr.region.amazonaws.com. Let's say we remove the region. Do you want DNS control over where this URL points to, or do you want ECR to figure out the best way to serve the request (knowing that this can have cost and performance implications)?

Including a "region" part of the registry URL seems acceptable. This feel similar to spinning up S3 Buckets in a different region and setting up replication.

ajohnstone commented 5 years ago
  • When do you want to replicate an image? Is it sufficient to replicate all images, or do you want some control over which images are replicated?

Replication at the repository level would be more than sufficient.

  • Do you want to replicate across accounts?

Yes, unfortunately ECR is currently too limited due to IAM permissions not being granular enough to cover image/tags. As such, we cannot prevent images from being pulled from ECR if an individual image had a vulnerability or had not been marked as scanned. See use cases in #230

  • What are the primary problems you're trying to solve with cross region replication? For example, you might be trying to speed up image pulls, reduce your data transfer bill, build another region for disaster recovery, etc.

DR and isolation between regions. Vulnerability scanning and pulling images.

  • How much control do you really want to have over referring to an image? Today, ECR includes a "region" part in the registry URL: https://aws_account_id.dkr.ecr.region.amazonaws.com. Let's say we remove the region. Do you want DNS control over where this URL points to, or do you want ECR to figure out the best way to serve the request (knowing that this can have cost and performance implications)?
  1. A generic endpoint that is globally distributed and points to nearest AWS owned POP/region.
  2. VPC endpoints to ECR with the same generic endpoint. DNS ideally the same except points to Interface. The VPC endpoint to support a policy.
israelp commented 5 years ago

@jtoberon For my project: 1) Replicate on push, I prefer to mark a repository for replication (starting with all images would be fine) 2) no need to replicate across accounts, I m using one account for images, and other multiple accounts pull the images, I need that the repository permissions will also be replicated (repository metadata) 3) I m using vpc-endpoints, and I don't want my Fargate cluster to go public. 4) Yes, I prefer, like other vpc-endpoints interfaces, that you will do private-dns for it. Thank you. IP

raghukumarc commented 5 years ago

We are looking at ECR cross-account cross-region replication for DR. I am sure most of the ECS users are building it in house for redundancy in case of Region failures or for DR.

Globegitter commented 5 years ago

For our use-case we only need cross-region replication for the same account only to regions we can control. The main use-cases would be to speed up image pulls, have further redundancy and especially to reduce our data transfer bill.

barooi commented 5 years ago

@jtoberon

We're also very much interested in a Replication solution.

Our current design, we are implementing now, will be as described below.

We have an "CICD" AWS Account, where we build and push our dev builds to ECR repositories. A release consist of promoting (copying) the relevant images to repositories in a separate "Release" AWS Account. We define these repositories as "master" repos (for releases and dev builds).

Multiple AWS Accounts (DTAP) exist where we run our clusters, they define the same repositories ("slave" repos in this case) for each region we are active in. For instance; Production runs in 7 regions. We will implement a pull system that will check for the existence of the requested image version in the region slave ECR. If it is not found we pull the image from the relevant master repo to the slave repo.

Production, acceptance and test will only pull images from the Releases master, while our Development clusters will pull from both master sources.

Which brings us to your questions.

When do you want to replicate an image? Is it sufficient to replicate all images, or do you want some control over which images are replicated?

Ideally we want to define multiple master repositories for a slave repo. Replication only occurs when an image is not present and we do not need to control which images are replicated. It would be nice to have an ordering in place which source repo to query first for missing versions (release master > development master).

As such, the slave repos work as a caching proxy to one or more master repos.

Do you want to replicate across accounts? Yes, we have multiple accounts as described.

What are the primary problems you're trying to solve with cross region replication? For example, you might be trying to speed up image pulls, reduce your data transfer bill, build another region for disaster recovery, etc. All of the above!

_How much control do you really want to have over referring to an image? Today, ECR includes a "region" part in the registry URL: https://aws_account_id.dkr.ecr.region.amazonaws.com. Let's say we remove the region. Do you want DNS control over where this URL points to, or do you want ECR to figure out the best way to serve the request (knowing that this can have cost and performance implications)?_ I think region can be dropped, but even with region still present I think we can manage.

cloventt commented 5 years ago

@jtoberon

This feature is highly desirable for my use-case. Pushing to multiple regions at once is our current workaround, but it can significantly increase the time taken for our deployment pipeline to run.

mbelang commented 5 years ago

@jtoberon

  1. Yes cross region would be a must
  2. Cross account is also a problem that we are facing

I will clarify my comment from above.

  1. Still valid
  2. We would like to have cross account replication because we are currently pushing our images to a single ECR in 1 of our account and we are pulling those images from other accounts/regions.
  3. Mostly image pull speed, data transfer bills
  4. Removing the region from the DNS is a must for us as we use a single ECR for all images. I do not know if that would be possible be removing the account from the DNS would be nice as user do not care about which account it sits in.

We decided to use that model to speed up our CI/CD pipeline. Pushing in a single ECR makes a log of sense IMHO. We are also using tags to promote images from dev to prod so no need to copy images to an other registry for every environment. The caveat of the is transfer cost and slow pull when say we push in ca-central-1 and pull from eu-central-1.

max-rocket-internet commented 5 years ago

I think we can summarise everything in this issue with one simple request: please just copy Google Cloud Container Registry 😃

mbelang commented 5 years ago

This is what I had in mind but was shy to say it :joy:

algestam commented 5 years ago

Found this issue while overlooking our DR plan and ensuring that our ECR images will be available in case of a region failure.

@jtoberon

  1. Whenever an image is pushed would be enough for our needs.
  2. Cross account replication would not be needed for our needs but I can definitely see a need for it in other projects.
  3. Primarily Disaster recovery, secondarily image pull speed
  4. Dropping the region from the URL would be nice. It would still be ok with having it in place though.

Until this has been implemented we will run own solution to copy images to other regions.

bminahan73 commented 5 years ago

We have our registries in a centralized account. For our use case images do not need to be replicated cross-account but would definitely still need to be pulled from multiple accounts.

Cross-region replication would assist us in disaster recovery and speed up our build/deploy process quite a bit. Since we need our images in two distinct locations for our DR plan, we currently do two docker pushes, one to each of two distinct regions in the same account. This unnecessarily adds time to the push part of our pipeline, potentially a lot of time if its a large image,

As for removing the region from the URL, our resources currently pull images based on the region they are deployed to. Example, AWS Batch definitions in us-east-2 pull images from us-east-2. There isn't really a reason for this other than potentially faster pull speeds (but not always turns out) when from the same region.

If AWS handled this decision for us and resolved url based on response time or something similar that would also be a huge improvement for DR. This would allow our services in us-east-2 to automatically pull images from us-west-2 if ECR in us-east-2 was experiencing issues, for example. Which today would be a manual change.

bminahan73 commented 5 years ago

another headache is ensuring all image tags are in sync across regions, not just the images themselves. In our workflow many individuals can adjust tags on images for various business purposes. This is currently a process issue to enforce if you change a tag in one region, do it in the other region too. We could automate this away but would love for a managed solution

ayush-sharma commented 5 years ago

We've just started using ECR as container registry, and this is currently a blocker for us. Any existing solutions to backup ECR repos and images somewhere? My main concern is preventing accidental deletion (or deliberate in the event of a credentials breach) and reducing latency when cloning from a different region.

amaydubey3 commented 4 years ago

I've been following this thread since the last 2 months and if someone is looking for alternative solutions to this, you might wanna do the following:

  1. Save the docker image. docker save <image> | gzip -c > <image>.tag
  2. Upload the .tgz file to an S3 bucket. What I did in addition to this is apply cross-region replication to the bucket and that backs up the .tgz file in 2 different buckets in 2 different regions.
MartinEmrich commented 4 years ago

Hi! Just found this...

I would love to get rid of both the account ID and the region in the image URLs. We want to use the same images in multiple regions and/or accounts, and I don't want to have K8s Deployments for each differing only in the image URL.

instead of having to use 123456789012.dkr.ecr.eu-central-1.amazonaws.com/my/image:tag, I'd prefer to be able to use e.g. dkr.amazonaws.com/my/image:tag, resolving automatically to the region and account my ECS/EKS/Fargate runs, or the IAM user logged in.

Or even better: omit only one of region or account, like 123456789012.dkr.ecr.local.amazonaws.com (using the local region but e.g. your CICD account) or dkr.ecr.eu-central-1.amazonaws.com (Using your current account, but always one specific region).

barryib commented 4 years ago

FWIW, I found this https://github.com/aws-samples/amazon-ecr-cross-region-replication but I don't test it yet.

AvinashKrSharma commented 4 years ago

My use case: I am running an ECS cluster inside a VPC without public subnets. So, to access ECR, I use PrivateLinks. Now because ECR is a region-specific service and so is Privatelink, I am bound to have ECR in every region where I intend to run my ECS cluster. Had there been support for cross-region replication for ECR repositories, it would have been a trivial task to achieve this.

vaidik commented 4 years ago

Much needed for our use-case. We use separate accounts for production and testing. Also the VPCs are in different regions. We build docker images in one account/region (testing) but would like to be able to get those images in the production acccount/region since we dont want to rebuild images (artefact promotion). Best would be to be able to replicate selective images instead of all images since we have a lot of extra images in our test/stage account.

Harbor seems to have a feature for this. It would be nice to have something like that.

anshul0915zinnia commented 4 years ago

verify much needed our use case as well

michielvermeir commented 4 years ago

The single-account, single-region design of ECR is just a pain in the ass. I think most of us would really appreciate a singular registry endpoint, with some settings on which accounts/regions you would like replication for, and not have all this complexity unnecessarily exposed.

I thought org.ecr.amazonaws.com or ecr.amazonaws.com/org/ were nice suggestions. Coping with different registry endpoints involves retagging container images a lot, lots of shuffling bytes around.

PatrickXYS commented 4 years ago

Our use case:

We're running EKS cluster and deploying Kubeflow application. The point here is we need to create Kubeflow Notebook Server with provided AWS Kubeflow Image (hosted on ECR). In Kubeflow, there's no functionality that dynamically detects users' region and provided corresponding ECR Image.

We have to let users to use one single ECR image from restricted region, that's pain point from our side. Users may suffer from poor pulling performance from different regions.

What we expect:

A single ECR image without region specified and ECR team can take care of the traffic or Image Duplication in different regions.

eist76 commented 4 years ago

this should be prioritized as it is a much needed feature. AWS customer need to be able to easily replicate ECR across different regions without any workarounds (codebuild, lambda, ...)

virajpadte commented 4 years ago

Went to a series of comments here and would like to add by saying I am facing a current scenario where I need image replication between EU-NORTH-1, US-EAST-1 and AP-SOUTH-EAST-1. The reason is simple we are trying to use ECR as a private repo solution across our organization. I am currently down the path following https://github.com/aws-samples/amazon-ecr-cross-region-replication but if there is a added feature from the ECR team that would be awesome!

jdjaro commented 4 years ago

+1 for this. We need multi region deployments for resiliency in the event of a regional outage, and having the images stuck in a repo in a single region is a major point of failure. There's no point in having a stack deployed in a backup region if there are no images available to run in it. Currently it appears to be a choice between this (as also mentioned by @virajpadte above), and using an ECR push event to trigger a Lambda that would copy the image to an ECR repo in another region. Both of these approaches seem like a lot of additional work for something that ECR should support by default.

sun-mir commented 4 years ago

Has anybody used https://github.com/uber/kraken to augment this functionality?

omieomye commented 4 years ago

A quick update to the ECR community, thanks for continuing to comment and influence this ask. We're actively working on it. High level, we're aiming to tackle a push into a primary region ECR, replicating into N other region ECRs. We're looking to support both single-account across multi-region, and multi-account across multi-region scenarios. As soon as possible, we'll move it to the Coming Soon stage.

Mehul1313 commented 4 years ago

Hello, what is the ETA of this feature? We are looking to copy the docker images to other region for DR. We are taking approach of the using docker push command to push the image from us-east-1 to us-west-2. Has anyone tried the approach. My only concern is the latency.

oslobodian commented 4 years ago

Hello, extremely needed for our use case.

TimoSchmechel commented 4 years ago

Also would much love this feature

sc-alscient commented 4 years ago

This https://aws.amazon.com/blogs/containers/advice-for-customers-dealing-with-docker-hub-rate-limits-and-a-coming-soon-announcement/ talks about geo-replication for public containers in the new public registry. You would assume that it could be done with private ones as well now?

AbdoNile commented 3 years ago

Hi , Glad to see this with a "coming soon" label. Will this include cross account replication as well ? my use cases requires this.

mwarkentin commented 3 years ago

It seems like this should be announcing soon:

joshuastern commented 3 years ago

https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-ecr-announces-cross-region-replication-of-images/

magJ commented 3 years ago

Support recently added, API reference documentation: https://docs.aws.amazon.com/AmazonECR/latest/APIReference/API_PutReplicationConfiguration.html https://docs.aws.amazon.com/AmazonECR/latest/APIReference/API_PutRegistryPolicy.html

User Guide documentation still seems to be unavailable.

omieomye commented 3 years ago

Shipped. https://aws.amazon.com/blogs/containers/cross-region-replication-in-amazon-ecr-has-landed/. The blog calls out some improvements we're already beginning to tackle. Thank you for being part of the Amazon ECR community!

odg0318 commented 3 years ago

This seems to be available only per registry not repository, right?

heidemn commented 3 years ago

@odg0318 yes, but it seems there are plans for more fine-granular control: https://aws.amazon.com/de/blogs/containers/cross-region-replication-in-amazon-ecr-has-landed/

What’s next? [...]

  • Replication status APIs to surface the progress of the replication process for an image.
  • The ability to add filters so that only a subset of repositories and images are replicated.
  • Notifications on replication events such as the completion of a copy.
  • Support for manifest lists.
christopher-wong commented 3 years ago

The docs here mention:

After this, every time you push an image to the private ECR repository (or call the replicate API explicitly) ECR automatically replicates the image.

Is this "replicate API" available?