aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.19k stars 315 forks source link

[ECR] [request]: support custom domains, or alternate URIs for repositories #299

Open philippmoehler0440 opened 5 years ago

philippmoehler0440 commented 5 years ago

Tell us about your request Currently a repository URI looks like this: <account_id>.dkr.ecr.<region>.amazonaws.com/<repository>. Account ID and region might be movable parts which has negative effects for the following scenarios described. It would be helpful to be able to define an alternate URI for ECR repositories.

Which service(s) is this request for? ECR, (.. and maybe other container services?)

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Our team provides a docker image for ~12 other teams that acts as a build tool for frontend resources within their pipelines. We identified different disaster recovery scenarios where the current ECR URI is a disadvantage:

(1) Unavailability of ECR within the specified region

(2) Disaster recovery for the ECR account

An alternate repository URI could be a fixed interface for other consumers. Changes for account ID or region behind this part would not affect them anymore.

Are you currently working around this issue? One way we had seen to "solve" this, is to use an nginx as reverse proxy for the ECR, but this is an effort we don`t want to practice.

Additional context This topic from January 2016 also describes some pain points with this.

Attachments

FernandoMiguel commented 5 years ago

all my docker images have an ARG for the account id, so i can and do easily replace it to point to different accounts

philippmoehler0440 commented 5 years ago

all my docker images have an ARG for the account id, so i can and do easily replace it to point to different accounts

This is fine if you are the only consumer - but with dependencies to around 12 other teams, you would always have to share this, even this could be solved more easily.

jtoberon commented 4 years ago

This is a great idea, and it's something we've planned for as we made other networking changes such as VPC Endpoint support. We see a bunch of additional benefits. For example, your developers will be able to use a friendly URI like "repo.mycompany.com" instead of having to remember an AWS account number. Also, if you run your own registry today, and you want to switch to ECR so that you don't have to manage (upgrade, monitor, scale, etc.) it, then DNS might help with the transition.

We are interested in hearing more about how customers would like to manage SSL certificates and DNS. Would you use AWS Certificate Manager (ACM) for certs? Would you create a Route 53 hosted zone for the subdomain?

FernandoMiguel commented 4 years ago

@jtoberon ACM for sure. We already have hosted zones on route53

philippmoehler0440 commented 4 years ago

@jtoberon yes we would use ACM and different hosted zones.

okor commented 4 years ago

@jtoberon

if you run your own registry today, and you want to switch to ECR ...., then DNS might help with the transition.

^ This is exactly the scenario we are in. If ECR supported custom DNS the switch would be relatively painless. Without custom DNS, there are a number of pain points:

And all of that is on top of the wacky authentication requirements for ECR. Y'all are not helping folks with established (but standard) authentication workflows or existing registries. The pain level goes up with the scale of the established operation. But those big registry users also seem like they would be the juicier targets for y'all, right?

jtoberon commented 4 years ago

@okor Currently, we're tentatively planning to work on this after cross region replication. What established authentication workflows do you have in mind?

alok87 commented 4 years ago

When is the work going to start on this? Interested to contribute to make this live. ✋

fred-vogt commented 4 years ago

Have to dog pile on this one.

I too have wanted this to be officially supported for awhile.

It is possible with an NGinx proxy.

Or API gateway + lambda

NOTE: you can't use the standard docker credential helper however, it has a regex that expects the default repo URIs

robbyt commented 3 years ago

It looks like this was already requested back in early 2016 here: https://forums.aws.amazon.com/thread.jspa?threadID=223934&start=25&tstart=0

But unfortunately, the team at Amazon have been very quiet about when we can expect a fix for this.

davidkarlsen commented 3 years ago

+1. We'd like to use ACM for the cert, but probably not route53 for DNS and instead cloudflare (simply because that's what we already do for the domain).

terowz commented 3 years ago

+850

guerzon commented 3 years ago

+1

This would definitely be very useful and would save our repositories and documentations from getting cluttered with a long ECR URL that has an account number in it.

oleksandr-gubchenko commented 3 years ago

+1

michelangelomo commented 3 years ago

+1

LegoStormtroopr commented 3 years ago

+1

(Purposefully not leaving a reaction, as I want to get notified when this is updated.)

Rana-Salama commented 3 years ago

You can subscribe to the issue to receive notifications instead of commenting.

kj187 commented 3 years ago

+1

Ferparishuertas commented 3 years ago

+1

JordanStopford commented 3 years ago

+1

WhyNotHugo commented 3 years ago

Putting a CloudFront distribution in front of ECR should work fine, right?

BeyondEvil commented 3 years ago

Almost 2 years have passed. Is there any progress being made?

charlie-park commented 3 years ago

Progress in your life or mine?

2021년 6월 29일 (화) 오후 8:03, Jim Brännlund @.***>님이 작성:

Almost 2 years have passed. Is there any progress being made?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws/containers-roadmap/issues/299#issuecomment-870497053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIJ4YQEOHK27MONER47L64DTVGSB5ANCNFSM4HN7TGDA .

-- [image: photo] Charlie Park Founder, Komachine A Yongin City, Gyeonggido, South Korea (17015) O +82-31-335-9901 <+82-31-335-9901> M +82-10-8218-7270 <+82-10-8218-7270> E @. @.> W www.komachine.com

WhyNotHugo commented 3 years ago

I'll ask again: What's the issue with using cloudfront using ECR as an origin?

RichiCoder1 commented 3 years ago

I'll ask again: What's the issue with using cloudfront using ECR as an origin?

Spinning up a non-trivial piece of infrastructure to use 5% of its functionality is not an answer.

WhyNotHugo commented 3 years ago

Spinning up a non-trivial piece of infrastructure to use 5% of its functionality is not an answer.

There's nothing to spin-up, cloudfront is a hosted service. And one of its main features is exactly what's being asked here.

If AWS added support for custom-domains for ECR registries, I can't image it'd be much less work than configuring cloudfront anyway -- you'd still have to address things like provisioning ACM certificates and creating Route53 records. There's not much more than 30-60 minutes of work here.

I'm not sure what you're expecting: it sound like all the tools are right there and what you actually need is someone to set them up for you.

RichiCoder1 commented 3 years ago

There's nothing to spin-up, cloudfront is a hosted service. And one of its main features is exactly what's being asked here.

If you want caching and geo-distribution sure. But if all you want is a domain, you're spinning up service with non-trivial deployment times and non-zero costs just to get a domain.

you'd still have to address things like provisioning ACM certificates and creating Route53 records

Sure, but that's fine and probably something I'm already doing if I'm asking for custom domains. Cloudfront is not as a given.

It's also worth calling out the OP, which says "One way we had seen to "solve" this, is to use an nginx as reverse proxy for the ECR, but this is an effort we don't want to practice." so you're proposing one moving part for an (albiet much easier to manage) other moving part.

joshm91 commented 2 years ago

Has anyone actually managed to get CloudFront working in-front of ECR? It's mostly working for me in that I can login using an ecr login password and I can pull images, but when trying to push images it seems as though it gets half-way and then fails with a message saying unauthorized: authentication required, even though I can successfully pull an image straight after.

julienbonastre commented 2 years ago

Haven't tried with CF no.. We are using a pattern, as suggested above, with a private registry behind a Nginx proxy/forwarder running as a Fargate service fronted by an HTTPS ALB

This allows us custom fqdn for our ECR which seems to be working great so far for docker auth/pull/push ops etc.

It is actually pretty neat and tidy to orchestrate and deploy and obviously if one desires to really go all out we could tie the fargate task to some cw metrics/alarms and targets with an ASG to control load demand, but for now we are happy to set the container replica count statically...

Anyway.. apologies, I know I haven't really answered your CF question but I wondered if you were keen on exploring an alternative

naftulikay commented 2 years ago

@julienbonastre seeing as how you have done this in NGINX, are you able to tag, push, and pull using the custom domain name?

For example:

docker build -t ecr.mydomain.com/my-project/my-image:latest
docker push ecr.mydomain.com/my-project/my-image:latest
docker pull ecr.mydomain.com/my-project/my-image:latest

I guess I'm just asking whether ECR will play nicely if the domain doesn't match the long ECR domain.

julienbonastre commented 2 years ago

Sorry, I should have provided some context..

Here is an trimmed excerpt of the nginx conf I am using in the nginx container task which runs as an ECS Fargate service (scaled to 2 replicas)

   server {
        listen       443 ssl http2;
        listen       [::]:443 ssl http2;
        ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
        ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
        ssl_dhparam /etc/ssl/certs/dhparam.pem;
#        ssl_session_cache shared:SSL:1m;
#        ssl_session_timeout  10m;
        chunked_transfer_encoding on;
        client_max_body_size 0;
        server_name     _;

        ########################################################################
        # from https://cipherli.st/                                            #
        # and https://raymii.org/s/tutorials/Strong_SSL_Security_On_nginx.html #
        ########################################################################

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
        ssl_prefer_server_ciphers on;
        ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";
        ssl_ecdh_curve secp384r1;
        ssl_session_cache shared:SSL:10m;
        ssl_session_tickets off;
        ssl_stapling on;
        ssl_stapling_verify on;
        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;
        # Disable preloading HSTS for now.  You can use the commented out header line that includes
        # the "preload" directive if you understand the implications.
        #add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";
        add_header Strict-Transport-Security "max-age=63072000; includeSubdomains";
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;

        ##################################
        # END https://cipherli.st/ BLOCK #
        ##################################

        location / {
                proxy_pass              https://<aws acct id>.dkr.ecr.ap-southeast-2.amazonaws.com;
                proxy_set_header        Host                "<aws acct id>.dkr.ecr.ap-southeast-2.amazonaws.com";
                proxy_set_header        X-Real-IP           $remote_addr;
                proxy_set_header        X-Forwarded-For     $proxy_add_x_forwarded_for;
                proxy_set_header        X-Forwarded-Proto   "https";
                proxy_read_timeout      900;
        }
    }

Yes, I am using a self-signed cert generated on the nginx container itself during init that is referred to in the nginx.conf

ARG ECR_FQDN=ecr.mydomain.com
ARG BASE_NGINX_IMAGE=nginx:latest

FROM ${BASE_NGINX_IMAGE}

RUN mkdir -p /etc/ssl/private
RUN chmod 700 /etc/ssl/private

RUN openssl req -x509 -nodes -days 365                 \
    -newkey rsa:2048                                    \
    -keyout /etc/ssl/private/nginx-selfsigned.key       \
    -out /etc/ssl/certs/nginx-selfsigned.crt            \
    -subj "/C=AU/ST=NA/L=NA/O=MyOrganisationName/CN=${ECR_FQDN}"    

RUN openssl dhparam -out /etc/ssl/certs/dhparam.pem 2048

COPY ./nginx.conf /etc/nginx/nginx.conf

EXPOSE 80 443

Note I am referencing my desired target ECR_FQDN within the dockerfile as a buildarg and then generating the self-signed cert based off this for the SAN/subject..

However, I have actually realised it seems to not matter what fqdn is used as I recently have tested accessing the nginx proxy with different ones and it all still worked fine..

So, in summary, I have an ALB listening on https using a real ACM SSL cert with fqdn such as assigned to the ALB and the listener group target is setup with the ecs fargate cluster task build as a HTTPS forwarding group to the registered IP targets for the ecs fargate tasks (which of course is auto managed by fargate/ecs service).

So nginx is making the https calls to the aws ECR private registry as a proxy from the HTTPS calls to the ALB...

I am no security expert but this looks like a fully SSL chained request through each hop to the target ECR and from client and works a treat for us to do all the above.

So far anyway I haven't had/encountered any issues..

Obviously there is an inherent dependency here now on the availability/throughput of the ecs-fargate-nginx-proxy task, however being a fargate task this can easily be scaled to multiple fixed replicas or tied to an ASG/CW event trigger to scale up/down on demand metrics etc as desired of course to make sure the proxy can handle your workloads..

HTH

FernandoMiguel commented 2 years ago

Sorry, I should have provided some context..

Here is an trimmed excerpt of the nginx conf I am using in the nginx container task which runs as an ECS Fargate service (scaled to 2 replicas)

   server {
        listen       443 ssl http2;
        listen       [::]:443 ssl http2;
        ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
        ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
        ssl_dhparam /etc/ssl/certs/dhparam.pem;
#        ssl_session_cache shared:SSL:1m;
#        ssl_session_timeout  10m;
        chunked_transfer_encoding on;
        client_max_body_size 0;
        server_name     _;

        ########################################################################
        # from https://cipherli.st/                                            #
        # and https://raymii.org/s/tutorials/Strong_SSL_Security_On_nginx.html #
        ########################################################################

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
        ssl_prefer_server_ciphers on;
        ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";
        ssl_ecdh_curve secp384r1;
        ssl_session_cache shared:SSL:10m;
        ssl_session_tickets off;
        ssl_stapling on;
        ssl_stapling_verify on;
        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;
        # Disable preloading HSTS for now.  You can use the commented out header line that includes
        # the "preload" directive if you understand the implications.
        #add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";
        add_header Strict-Transport-Security "max-age=63072000; includeSubdomains";
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;

        ##################################
        # END https://cipherli.st/ BLOCK #
        ##################################

        location / {
                proxy_pass              https://<aws acct id>.dkr.ecr.ap-southeast-2.amazonaws.com;
                proxy_set_header        Host                "<aws acct id>.dkr.ecr.ap-southeast-2.amazonaws.com";
                proxy_set_header        X-Real-IP           $remote_addr;
                proxy_set_header        X-Forwarded-For     $proxy_add_x_forwarded_for;
                proxy_set_header        X-Forwarded-Proto   "https";
                proxy_read_timeout      900;
        }
    }

Yes, I am using a self-signed cert generated on the nginx container itself during init that is referred to in the nginx.conf

ARG ECR_FQDN=ecr.mydomain.com
ARG BASE_NGINX_IMAGE=nginx:latest

FROM ${BASE_NGINX_IMAGE}

RUN mkdir -p /etc/ssl/private
RUN chmod 700 /etc/ssl/private

RUN openssl req -x509 -nodes -days 365                 \
    -newkey rsa:2048                                    \
    -keyout /etc/ssl/private/nginx-selfsigned.key       \
    -out /etc/ssl/certs/nginx-selfsigned.crt            \
    -subj "/C=AU/ST=NA/L=NA/O=MyOrganisationName/CN=${ECR_FQDN}"    

RUN openssl dhparam -out /etc/ssl/certs/dhparam.pem 2048

COPY ./nginx.conf /etc/nginx/nginx.conf

EXPOSE 80 443

Note I am referencing my desired target ECR_FQDN within the dockerfile as a buildarg and then generating the self-signed cert based off this for the SAN/subject..

However, I have actually realised it seems to not matter what fqdn is used as I recently have tested accessing the nginx proxy with different ones and it all still worked fine..

So, in summary, I have an ALB listening on https using a real ACM SSL cert with fqdn such as assigned to the ALB and the listener group target is setup with the ecs fargate cluster task build as a HTTPS forwarding group to the registered IP targets for the ecs fargate tasks (which of course is auto managed by fargate/ecs service).

So nginx is making the https calls to the aws ECR private registry as a proxy from the HTTPS calls to the ALB...

I am no security expert but this looks like a fully SSL chained request through each hop to the target ECR and from client and works a treat for us to do all the above.

So far anyway I haven't had/encountered any issues..

Obviously there is an inherent dependency here now on the availability/throughput of the ecs-fargate-nginx-proxy task, however being a fargate task this can easily be scaled to multiple fixed replicas or tied to an ASG/CW event trigger to scale up/down on demand metrics etc as desired of course to make sure the proxy can handle your workloads..

HTH

Quite a bit off-topic, but if you only have the LB accessing your nginx, you can use a much nicer, secure, smaller nginx config Here's the config from Mozilla https://ssl-config.mozilla.org/#server=nginx&version=1.17.7&config=modern&openssl=1.1.1d&hsts=false&ocsp=false&guideline=5.6 You may have to set tls 1.2 tho.

Alternatively, you can use caddy server as the reverse proxy, since it is far more modern than nginx and cloud native.

julienbonastre commented 2 years ago

Thanks @FernandoMiguel for the tip, yes I'll definitely look into this, I too didn't like this extra "SSL" configs but I found these under some recommendation in the community, however you are correct, that recommendation is >6 years old now so definitely not cool..

I will certainly checkout caddy, as you say, this is a very very simple use-case for a rev proxy so the lighter and more modern the better! 🚀 😎

blowfishpro commented 2 years ago

One thing I just thought of, how would custom domains in ECR mesh with pulling images in ECS or EKS? Would it just work or is there detection on the repository URL that allows the IAM roles to pull images?

julienbonastre commented 2 years ago

One thing I just thought of, how would custom domains in ECR mesh with pulling images in ECS or EKS? Would it just work or is there detection on the repository URL that allows the IAM roles to pull images?

@blowfishpro https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html

By design, AWS ECR auth token issued for the requesting principal will have all grants/permissions for access as per their IAM policies define.

blowfishpro commented 2 years ago

https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html

By design, AWS ECR auth token issued for the requesting principal will have all grants/permissions for access as per their IAM policies define.

Right, so naturally the tokens will still work. What I'm wondering is whether ECS/EKS look at the registry url to know whether they should even request a token from ECR when pulling an image.

joshm91 commented 2 years ago

I haven't had chance to try this yet but I'm almost certain it won't work for EKS as cleanly as it does with a regular ECR repo URL. The AWS docs specify that:

When referencing an image from Amazon ECR, you must use the full registry/repository:tag naming for the image. For example, aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:latest

I think you'll need to add your custom domain ECR repo as a private repo to EKS and use the regular Kubernetes imagePullSecrets feature.

naftulikay commented 2 years ago

I've setup the following:

When I attempt to login, it fails:

$ aws ecr get-login-password | docker login -u AWS --password-stdin docker.mycompany.com
Error response from daemon: login attempt to https://docker.mycompany.com/v2/ failed with status: 401 Unauthorized

My ACM certificate is valid, and it does appear that Route 53 is working as well. When I curl the registry:

$ curl -isSL https://docker.mycompany.com
HTTP/2 401
content-type: text/plain; charset=utf-8
content-length: 15
docker-distribution-api-version: registry/2.0
www-authenticate: Basic realm="https://$ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/",service="ecr.amazonaws.com"
date: Wed, 15 Sep 2021 19:42:46 GMT
x-cache: Error from cloudfront
via: 1.1 9e50af49c68f20e188890e7945ad09a2.cloudfront.net (CloudFront)
x-amz-cf-pop: LAX50-C3
x-amz-cf-id: 247mkimCM2ZnD5fptIlqTAINpC5FSpDAIjJLw_wvrhL1xXblGHOtCQ==

Not Authorized

My Cloudfront configuration in Terraform:

resource aws_cloudfront_distribution default {
  enabled = true
  retain_on_delete = true
  comment = "ECR Docker Registry front-end."

  aliases = ["docker.mycompany.com"]

  is_ipv6_enabled = true
  http_version = "http2"

  default_root_object = "index.html"

  price_class = "PriceClass_100"

  origin {
    origin_id = "ecr-us-east-1"
    domain_name = "$ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com"

    custom_origin_config {
      http_port = 80
      https_port = 443
      origin_protocol_policy = "https-only"
      origin_ssl_protocols = ["TLSv1.2"]
    }
  }

  default_cache_behavior {
    target_origin_id = "ecr-us-east-1"

    min_ttl = 0
    default_ttl = 0
    max_ttl = 86400

    compress = true
    allowed_methods = ["GET", "HEAD"]
    cached_methods = ["GET", "HEAD"]
    viewer_protocol_policy = "redirect-to-https"

    forwarded_values {
      query_string = true
      cookies {
        forward = "all"
      }
    }
  }

  restrictions {
    geo_restriction { restriction_type = "none" }
  }

  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate.default.arn
    minimum_protocol_version = "TLSv1.2_2021"
    ssl_support_method = "sni-only"
  }

  tags = {
    client = "self"
  }
}

Has anyone else here been able to setup a proxy that works with using a custom domain, or does Docker/ECR do things that absolutely depend on the Host header or the domain name giving an exact match to $ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com?

julienbonastre commented 2 years ago

@naftulikay Yes, as per https://github.com/aws/containers-roadmap/issues/299#issuecomment-906901973 however using nginx proxy not CF cdn.

And yes, It was crucial to set the Host header to the fqdn of target private registry in my experience for it to work and auth your creds correctly.

This nginx fargate proxy solution is working in production for us and haven't faced any issues yet, however as noted by @joshm91 above there may be some further config required for EKS workloads. We are only using ECS/Fargate currently so this isn't an issue.

naftulikay commented 2 years ago

@julienbonastre I'm going to try Lambda@Edge to change the Host header, because Host is one header that you're not allowed to set in the Cloudfront distribution configuration. I found the following solution on ServerFault for a NodeJS Lambda@Edge function:

'use strict';

// force a specific Host header to be sent to the origin

exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;
    request.headers.host[0].value = 'www.example.com';
    return callback(null, request);
};
naftulikay commented 2 years ago

I have it working with the above code. I have ACM for certificate management, Route 53 for DNS, Cloudfront as the edge for the private registry, and using Lambda@Edge to rewrite the Host header.

Cloudfront Terraform:

resource aws_cloudfront_distribution default {
  enabled = true
  retain_on_delete = true
  comment = "ECR Docker Registry front-end."

  aliases = ["docker.mycompany.com"]

  is_ipv6_enabled = true
  http_version = "http2"

  default_root_object = "index.html"

  price_class = "PriceClass_100"

  origin {
    origin_id = "ecr-us-east-1"
    domain_name = local.ecr_us_east_1

    custom_origin_config {
      http_port = 80
      https_port = 443
      origin_protocol_policy = "https-only"
      origin_ssl_protocols = ["TLSv1.2"]
    }
  }

  default_cache_behavior {
    target_origin_id = "ecr-us-east-1"

    min_ttl = 0
    default_ttl = 0
    max_ttl = 86400

    compress = true
    allowed_methods = ["GET", "HEAD"]
    cached_methods = ["GET", "HEAD"]
    viewer_protocol_policy = "redirect-to-https"

    forwarded_values {
      query_string = true
      headers = ["*"]
      cookies {
        forward = "all"
      }
    }

    # first thing to do on the way in is to rewrite the host header using our lambda function
    lambda_function_association {
      event_type = "origin-request"
      lambda_arn = aws_lambda_function.host_rewrite.qualified_arn
      include_body = false
    }
  }

  restrictions {
    geo_restriction { restriction_type = "none" }
  }

  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate.default.arn
    minimum_protocol_version = "TLSv1.2_2021"
    ssl_support_method = "sni-only"
  }

  tags = {
    client = "self"
  }
}

Lambda Terraform:

# NOTE in order for cloudfront proxy to ECR to work, we need to rewrite the `Host` header dynamically. Normally, it
#      would be possible to do this in Cloudfront, but Cloudfront does not allow rewriting the `Host` header. Therefore,
#      we have a Lambda@Edge function which simply overwrites the `Host` header for us.
resource aws_lambda_function host_rewrite {
  function_name = "ecr-docker-host-rewrite"
  description = "Rewrites the Host header for incoming requests to ECR to allow custom domains."

  runtime = "nodejs14.x"
  filename = data.archive_file.lambda_code.output_path
  source_code_hash = filebase64sha256(data.archive_file.lambda_code.output_path)
  handler = "index.handler"
  timeout = 5
  memory_size = 128
  publish = true

  role = aws_iam_role.lambda_host_rewrite.arn

  tags = {
    client = "self"
  }

  depends_on = [data.archive_file.lambda_code]
}

The IAM role assigns the provided basic Lambda execution policy, and provides role assumption from both lambda.amazonaws.com and edgelambda.amazonaws.com in order for it to work.

The Lambda function code in JavaScript:

#!/usr/bin/env node

// serverfault answer: https://serverfault.com/a/888776/70024
// event data structure: https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html
// (many) limitations on Lambda@Edge: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/edge-functions-restrictions.html

const REGISTRY = "MY_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com";

/**
 * Callback function for a Cloudfront Lambda@Edge request event. Rewrites the `Host` header to match the specified
 * registry host-name.
 * @param event The Cloudfront Lambda event.
 * @param context Lambda event context.
 * @param callback Callback to fire upon complete.
 * @returns {*} Invocation result of callback.
 */
exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;

    // replace host header with registry url
    request.headers.host[0].value = REGISTRY;

    return callback(null, request);
}

This works, the important bits are that

  1. aws_cloudfront_distribution.default_cache_behavior.lambda_function_association is registered to our Lambda function.
  2. aws_cloudfront_distribution.default_cache_behavior.lambda_function_association.event_type is set to origin-request, so that it will act upon the request sent from an edge to an origin. If it is set to viewer-request, it cannot modify the request. See the docs for an understanding on the different values here.
  3. aws_cloudfront_distribution.default_cache_behavior.forwarded_values is set to ["*"]. It might work if you just set this to Host, but I'm not sure.
  4. The Lambda@Edge function has a lot of limitations over regular Lambda.

Inserting console.log statements into my Lambda@Edge function do not result in CloudWatch logs being created. I'm not sure why. When I run it as a test function with test data, logs are collected. I can't find documentation on what is provided when the event type is origin-request. If the origin's hostname is included in the event payload, a generic function could be written without hard-coding that will work generically for anyone that uses it.

It was a lot of work to get this together, but I can now log-in to the repository and it works as expected.

joshm91 commented 2 years ago

@naftulikay I haven't looked at this for a while since it never quite worked how I wanted it, but I definitely managed to get a docker login working and could pushes/pulls working as well but it would occasionally fail half way through pulling an image with a "401 unauthorized" so there was still some weirdness going on. I think this is the terraform that I had "working". I think the key was to only whitelist the Authorization header (sent by the client). This way CloudFront continues to send the correct Host header to ECR.

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

resource "aws_cloudfront_cache_policy" "ecr" {
    name = "ecr"
    min_ttl = "1"
    parameters_in_cache_key_and_forwarded_to_origin {
        cookies_config {
            cookie_behavior = "none"
        }
        headers_config {
            header_behavior = "whitelist"
            headers {
                items = ["Authorization"]
            }
        }
        query_strings_config {
            query_string_behavior = "none"
        }

        enable_accept_encoding_brotli = true
        enable_accept_encoding_gzip = true
    }
}

resource "aws_cloudfront_origin_request_policy" "ecr" {
  name    = "ecr"
  cookies_config {
    cookie_behavior = "all"
  }
  headers_config {
    header_behavior = "whitelist"
    headers {
      items = ["Accept-Charset", "Accept", "Accept-Language", "Accept-Datetime"]
    }
  }
  query_strings_config {
    query_string_behavior = "all"
  }
}

resource "aws_cloudfront_distribution" "ecr" {
    origin {
        domain_name = "${data.aws_caller_identity.current.account_id}.dkr.ecr.${data.aws_region.current.name}.amazonaws.com"
        origin_id = "ECR"

        custom_origin_config {
            http_port = 80
            https_port = 443
            origin_protocol_policy = "https-only"
            origin_ssl_protocols = ["SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"]
        }
    }

    enabled = true

    aliases = ["my.domain.io"]

    restrictions {
        geo_restriction {
            restriction_type = "none"
        }
    }

    default_cache_behavior {
        allowed_methods  = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
        cached_methods = ["GET", "HEAD"]
        target_origin_id = "ECR"
        origin_request_policy_id = aws_cloudfront_origin_request_policy.ecr.id
        cache_policy_id = aws_cloudfront_cache_policy.ecr.id
        viewer_protocol_policy = "redirect-to-https"
    }

    viewer_certificate {
        acm_certificate_arn = var.acm_arn
        ssl_support_method = "sni-only"
    }
}
naftulikay commented 2 years ago

@joshm91 Thanks so much for providing your Terraform. My current status of my Cloudfront proxy to ECR:

Trying to push images results in:

$ docker rmi docker.mycompany.com/org/repo:latest
$ docker build -t docker.mycompany.com/org/repo:latest ./
$ docker push docker.mycompany.com/org/repo:latest
The push refers to repository [docker.mycompany.com/org/repo]
fac15b2caa0c: Pushing [==================================================>]  7.168kB
f8bf5746ac5a: Pushing [==================================================>]  3.584kB
d11eedadbd34: Pushing [==================================================>]  4.096kB
797e583d8c50: Pushing [==================================================>]  3.072kB
bf9ce92e8516: Preparing 
d000633a5681: Waiting 
unauthorized: authentication required

It seems to be able to do most operations but always ends in unauthorized: authentication required.

Here is my configuration, with large parts adapted from yours.

Terraform Code ```hcl resource aws_cloudfront_distribution default { enabled = true retain_on_delete = true comment = "ECR Docker Registry front-end." aliases = ["docker.mycompany.com"] is_ipv6_enabled = true http_version = "http2" default_root_object = "index.html" price_class = "PriceClass_100" origin { origin_id = "ecr-us-east-1" domain_name = local.ecr_us_east_1 custom_origin_config { http_port = 80 https_port = 443 origin_protocol_policy = "https-only" origin_ssl_protocols = ["TLSv1.2"] } } default_cache_behavior { target_origin_id = "ecr-us-east-1" min_ttl = 0 default_ttl = 0 max_ttl = 60 allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"] cached_methods = ["GET", "HEAD"] viewer_protocol_policy = "redirect-to-https" cache_policy_id = aws_cloudfront_cache_policy.default.id origin_request_policy_id = aws_cloudfront_origin_request_policy.default.id # first thing to do on the way in is to rewrite the host header using our lambda function lambda_function_association { event_type = "origin-request" lambda_arn = aws_lambda_function.host_rewrite.qualified_arn include_body = false } } restrictions { geo_restriction { restriction_type = "none" } } viewer_certificate { acm_certificate_arn = aws_acm_certificate.default.arn minimum_protocol_version = "TLSv1.2_2021" ssl_support_method = "sni-only" } tags = { client = "self" } } resource aws_cloudfront_cache_policy default { name = "ecr" min_ttl = 1 parameters_in_cache_key_and_forwarded_to_origin { cookies_config { cookie_behavior = "all" } headers_config { header_behavior = "whitelist" headers { items = ["Authorization"] } } query_strings_config { query_string_behavior = "all" } enable_accept_encoding_brotli = true enable_accept_encoding_gzip = true } } resource aws_cloudfront_origin_request_policy default { name = "ecr" cookies_config { cookie_behavior = "all" } headers_config { header_behavior = "allViewer" } query_strings_config { query_string_behavior = "all" } } ```

And here's my current Lambda@Edge function code:

Lambda JavaScript Code ```hcl #!/usr/bin/env node // serverfault answer: https://serverfault.com/a/888776/70024 // event data structure: https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html // (many) limitations on Lambda@Edge: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/edge-functions-restrictions.html const REGISTRY = "MYACCOUNTID.dkr.ecr.us-east-1.amazonaws.com"; /** * Callback function for a Cloudfront Lambda@Edge request event. Rewrites the `Host` header to match the specified * registry host-name. * @param event The Cloudfront Lambda event. * @param context Lambda event context. * @param callback Callback to fire upon complete. * @returns {*} Invocation result of callback. */ exports.handler = (event, context, callback) => { const request = event.Records[0].cf.request; console.log(`event: ${JSON.stringify({ 'event': event, 'context': context }, null, 2)}`); // replace host header with registry url request.headers.host[0].value = REGISTRY; return callback(null, request); } ```

Without the Lambda host rewrite, login fails, so I know that it's doing the right thing at least to get login working.

One of the big pains in my side is that I can't seem to get any logs out of my Lambda function to try to see what is included in the event and context. If I invoke the function from the console using test data, it does write to its CloudWatch logs group, and I have given the Lambda execution role the following permissions:

Lambda Logging Policy Document ```json { "Version": "2012-10-17", "Statement": [ { "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:*:*:*", "Effect": "Allow" } ] } ```

I might have to use an actual CloudWatch logging library because console.log simply isn't doing anything, even though I know my function is working.

I'd really like to get this working, and if I could get some help from anyone here to get everything working, I'll gladly publish my results in an open source GitHub repository and potentially to the Terraform Modules registry for others to use.

The last open item to address is getting docker push working. Is there a way to debug this using the Docker client to see what is being sent and where?

jmchuster commented 2 years ago

@julienbonastre if you already have an ALB set up, you should be able to edit its listener rule to have the default action redirect to the ecr address, without needing the additional nginx box

julienbonastre commented 2 years ago

@julienbonastre if you already have an ALB set up, you should be able to edit its listener rule to have the default action redirect to the ecr address, without needing the additional nginx box

Um, ok.. @jmchuster , Yes.. I can.. WTAH... I definitely recall trying this originally and obviously correcting the passed Host header to the target ECR FQDN and for some reason it didn't seem to be happy...

However I just attempted it again, and yes, it is working fine for auth/pull/push....

This is clearly a much better approach and less infrastructure required! I'm confused now as to why this didn't work for me initially or what pushed me down the direction of using nginx to do the Host header rewrite......... :scratches head:

Anyway. Awesome! I will refactor now and make this even cleaner!

julienbonastre commented 2 years ago

Ok.. I recant my prior statement @jmchuster .... So now I can see the issue........ I am not sure of the WHY or HOW but it does not work and support docker push requests when using the listener forwarding method.. However, it works 👌🏻 obviously with my original nginx pattern....

I need to see why, there is obviously something really simple happening in regards to uri req string being either differently handled or similar as the error we receive is:

docker push containers.company.com/team_name/app_name/service:tag-we-want-to-use
The push refers to repository [containers.company.com/team_name/app_name/service]
e2eb06d8af82: Preparing
unsupported: Invalid parameter at 'layerDigest' failed to satisfy constraint: 'Member must satisfy regular expression pattern: [a-zA-Z0-9-_+.]+:[a-fA-F0-9]+'

So, just like that, I am reverting to my original design 👍🏻 🆗 ✅

boruttkal commented 2 years ago

@naftulikay I got to the same problem. Did you manage to fix it?

naftulikay commented 2 years ago

@boruttkal @julienbonastre so everything works but push it seems. Is there anything else I need to do to get push working? I think it's an internal Docker request payload thing and would not be easy to rewrite. Basically I think that what is happening is that I build my image:

docker build -t docker.naftuli.wtf/org/repo:latest ./

And then I try to push that, and the actual request payload fails because AWS probably sees a 404: there is no repository identified by the long name. I have tried tagging with multiple tags and pushing to the right place, but it wasn't working last time I checked.

Has anyone got the full lifecycle working, and what commands are you using to interact with the pretty URL as opposed to the ugly AWS direct URL?

julienbonastre commented 2 years ago

@naftulikay Yes, the pattern I described using initially is working perfectly for our org.

It supports docker push myecr.org.com/folder/service:tag without any issues.

As of yesterday, I also managed a pattern to support this pretty ECR FQDN within ECS task container definitions on ECS EC2/Fargate arrangements - using the existing Private Registry authentication mechanism and a lambda which rotates the docker auth token every 11hrs.

I will summarise and re-post later this morning how this has been achieved but it all definitely works and supports push/pull without concern 🤗👌🏼🚀

What method have you used @naftulikay to proxy your ecr ssl requests?

naftulikay commented 2 years ago

@julienbonastre Excellent! Glad to hear you found a way to make it work. I'm on the CloudFront :point_right: Lambda side of things, so maybe things are different here.

Can you share the following?

  1. Commands you're using for docker build and tagging of your images; are you tagging with both the ECR URL and your custom URL?
  2. Commands you're using for docker login: are you using the ECR URL or your custom URL?
  3. Commands you're using for docker push: are you pushing to the ECR URL or your custom URL?
  4. Commands you're using for docker pull: are you pulling from the ECR URL or your custom URL?

If you could answer these with the actual commands, that would be very helpful so I could know what to expect as far as how to change my implementation to get things working.

Also I know you described your NGINX configuration above, but can you give us a little more info as to how all of this is setup? Do you have one or more NGINX nodes running with public IPs, listening for TLS on 443, modifying the Host header, and passing to the upstream? Is that all there is to it?

My setup is described in detail above including all the Terraform and the Lambda host-rewrite JavaScript code.

The route from the internet to my ECR URL using the proxy:

As per above, I get strange unauthorized errors when I attempt to push:

$ docker build -t docker.mycompany.com/org/repo:latest ./
$ docker push docker.mycompany.com/org/repo:latest
The push refers to repository [docker.mycompany.com/org/repo]
fac15b2caa0c: Pushing [==================================================>]  7.168kB
f8bf5746ac5a: Pushing [==================================================>]  3.584kB
d11eedadbd34: Pushing [==================================================>]  4.096kB
797e583d8c50: Pushing [==================================================>]  3.072kB
bf9ce92e8516: Preparing 
d000633a5681: Waiting 
unauthorized: authentication required

I'm able to docker login and docker pull, but docker push is not working.

My theory as to why it's not working is that the internal Docker HTTP payloads specify the image name using the custom URL docker.mycompany.com/org/repo:latest and not MY_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/org/repo:latest, and when ECR receives this image with a URL that does not match MY_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com, it rejects the push request, which is why I wonder if you're doing something special for the docker push case.


EDIT: If I'm able to get all of this working with CloudFront, ACM, and Lambda, I will (:pray:) publish my code as a Terraform module to the Terraform module registry so that others can do this without any hassle.