aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.48k stars 401 forks source link

Overriding image build arguments risks deploying the wrong image between different cross-account / same-region environments #2540

Open aaroncarlucci opened 3 years ago

aaroncarlucci commented 3 years ago

I'm writing this ticket to describe a use case we had few months ago which may already known about or resolved.

Our organization has two different AWS accounts, one for dev and one for production. Each application had a dev environment in us-west-2, and a staging environment in us-east-1. We began setting up applications in the dev account and when provisioning the production environment, took natural advantage of Copilot's cross-account support to create the prod environment managed by the dev account, linked using a different set of AWS access key credentials. The prod environment was also created using the us-east-1 region.

We are building and deploying the Docker images using Copilot commands from a Bitbucket pipeline setup. The Dockerfiles made use of build arguments to configure legacy NuxtJS applications. The build arguments were defined per-environment and had a marked impact on application functionality. The ability to refactor the application configuration toward more of a runtime approach was not an option at the time.

The issue I experienced came up after we'd deployed to production. After some time, when new changes had been integrated in the staging branch, pipeline and image in the ECR, our production environment ECS service restarted and downloaded the "staging" version of the ECR image because they shared the same commit hash. The production environment was running staging configuration :/

We eventually fixed this by decoupling the production application from the dev account and managing it as a separate application, environment and service using the AWS production account credentials directly. This has worked for us since.

The use case I'm raising exists because to my knowledge, cross-account applications that are defined to exist in the same regions are also sharing ECR repository images using the same commit hash. Because Copilot also supports overriding build arguments between environments, the potential exists to overwrite service runtimes between environments to unexpected and erroneous effects.

I wanted to ask if this architectural issue has already been solved with recent releases, is on the radar of the team, or if there is any guidance to most effectively plan for such a situation. My limited experience with Copilot suggests that perhaps the ECR repositories for cross-account environments should not be shared between environments utilizing the same region.

efekarakus commented 3 years ago

Hi @acarlton ! Thank you for reporting the issue. I'll try to summarize it to make sure we're both on the same page.
It'd be great if you can validate if my understanding is correct 🙏

  1. You created a pipeline with copilot pipeline init that deploys to multiple environments
  2. You are overriding the image.build field in the manifest:

    image:
     build:
       dockerfile: path/to/dockerfile
       args:
         key: testValue
    
    environments:
     prod:
        image:
           build:
             args:
               key: prodValue
  3. After deploying the prod environment doesn't end up having the correct build args

Looking at the generated buildspec I believe we have a bug:

  1. Like you said since we create a single ECR repo per region, and promote the same tag across stages then you don't end up getting a different image for prod with the overridden build args.
  2. It looks like in the pipeline we never build actually an image with the environment overrides and only build with the values passed in by default.
aaroncarlucci commented 3 years ago

@efekarakus your summary is incorrect on statement #1. I am not running the pipeline via Copilot, but via Bitbucket pipelines, using Copilot CLI commands to deploy the image. Aside from that, yes, I think your analysis that because Copilot uses a single ECR repository per region, the result is a different image for prod / staging from a different combination of build arguments.

I would assume that this is reproducible using a Copilot pipeline, but that's not my specific use case.

My guess is that the bug isn't addressable in the buildspec, but rather in the infrastructure architecture generally, where different environments in the same region would need separate, independent ECR repositories.

What do you think?

efekarakus commented 3 years ago

Got it! I assume in the Bitbucket pipeline you're running copilot deploy -e staging or copilot deploy -e prod part of a script?
I wonder if the --tag flag can help in this situation where you can specify a value such as copilot deploy -e staging --tag ${BITBUCKET_COMMIT}-staging or copilot deploy -e prod --tag ${BITBUCKET_COMMIT}-prod?

aaroncarlucci commented 3 years ago

@efekarakus It probably could -- are those tags already assigned, in addition to the raw git hash, by the current Copilot release?

efekarakus commented 3 years ago

If you specify a --tag flag then Copilot will only push that tag to the ECR repository.

By default like you wrote, if there is a clean git commit present then Copilot push to ECR with the commit hash. Otherwise, if there is no commit or its dirty then it gets pushed as latest but we use the image digest to refer to the image instead.

aaroncarlucci commented 3 years ago

Ah, yes. I see -- that does seem like a usable workaround. I hadn't thought of that, thanks. By now, I already backed out my architecture to have Copilot manage the dev and staging environments, in us-west-2 and us-east-1 respectively, in one AWS account, and the prod environment is managed as a separate application in a different AWS account.

Thanks for the suggestion, I can apply it elsewhere. Feel free to close this issue, but I'm curious if there is a way to reduce the possibility that others don't fall into this use case.

efekarakus commented 3 years ago

Awesome! Yes we will keep it open as it could happen to anybody using a cd pipeline. Thanks for reporting the issue! 🙏