aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.48k stars 403 forks source link

bug: Creating app runner services from ARM machines #2640

Closed jsagorin closed 2 years ago

jsagorin commented 3 years ago

Hi,

I'm following through the basic first app https://aws.github.io/copilot-cli/docs/getting-started/first-app-tutorial/. Everything goes along smoothly, but the ECS Service won't deploy. I receive this message:

Resource handler returned message: "Error occurred during operation ECS Deployment Circuit Breaker was triggered'." (RequestToken: XXXXXXXXXXX, HandlerErrorCode: GeneralServiceException)

Would anyone know where I can review deployment logs? I checked on both copilot svc logs and on AWS CloudWatch.

Here's a full log output

- Creating the infrastructure for the example-app-test environment.      [create complete]  [81.7s]
  - An IAM Role for AWS CloudFormation to manage resources               [create complete]  [24.6s]
  - An ECS cluster to group your services                                [create complete]  [6.4s]
  - Enable long ARN formats for the authenticated AWS principal          [create complete]  [0.0s]
  - An IAM Role to describe resources in your environment                [create complete]  [24.4s]
  - A security group to allow your containers to talk to each other      [create complete]  [5.7s]
  - An Internet Gateway to connect to the public internet                [create complete]  [13.9s]
  - Private subnet 1 for resources with no internet access               [create complete]  [16.8s]
  - Private subnet 2 for resources with no internet access               [create complete]  [16.8s]
  - Public subnet 1 for resources that can access the internet           [create complete]  [16.8s]
  - Public subnet 2 for resources that can access the internet           [create complete]  [16.8s]
  - A Virtual Private Cloud to control networking of your AWS resources  [create complete]  [13.9s]
βœ” Created environment test in region ap-southeast-2 under application example-app.
Environment test is already on the latest version v1.4.1, skip upgrade.
βœ” Proposing infrastructure changes for stack example-app-test-front-end 
- Creating the infrastructure for stack example-app-test-front-end                   [rollback complete]  [866.2s]
  The following resource(s) failed to create: [Service]. Rollback reques                                  
  ted by user.                                                                                            
  - Service discovery for your services to communicate within the VPC                [delete complete]    [0.0s]
  - Update your environment's shared resources                                       [update complete]    [114.2s]
    - A security group for your load balancer allowing HTTP and HTTPS traffic        [create complete]    [5.0s]
    - An Application Load Balancer to distribute public traffic to your services     [create complete]    [92.5s]
  - An IAM Role for the Fargate agent to make AWS API calls on your behalf           [delete complete]    [2.4s]
  - A CloudWatch log group to hold your service logs                                 [delete complete]    [2.4s]
  - An ECS service to run and maintain your tasks in the environment cluster         [delete complete]    [57.8s]
    Resource handler returned message: "Error occurred during operation 'E                                
    CS Deployment Circuit Breaker was triggered'." (RequestToken: XXXXXXXXXXX, HandlerErrorCode: GeneralServiceExceptio                                
    n)                                                                                                    
    Deployments                                                                                            
               Revision  Rollout   Desired  Running  Failed  Pending                                               
      PRIMARY  4         [failed]  1        0        9       1                                                     
  - A target group to connect the load balancer to your service                      [delete complete]    [2.4s]
  - An ECS task definition to group your containers and run them on ECS              [delete complete]    [0.0s]
  - An IAM role to control permissions for the containers in your tasks              [delete complete]    [2.4s]
bvtujo commented 3 years ago

Hey @jsagorin, the circuit breaker is triggered by 9 failures in task launch. Usually that's due to failing health checks or a container which can't come up properly. Can you share your manifest and Dockerfile?

In addition, you can find stopped task information in the ECS console for your service. If you navigate to your copilot cluster, then Services, then the Stopped Tasks tab, you can see the stopped reason for tasks that didn't come up. Can you also share the most common failure reasons for this deployment?

jsagorin commented 3 years ago

HI @bvtujo - sure. I can't get to the ECS logs, because the tasks aren't deployed.

Here's the Dockerfile

FROM public.ecr.aws/nginx/nginx:1.19
EXPOSE 80
COPY index.html /usr/share/nginx/html

Manifest - Manifest file is generated as part of copilot init. I've tried entries for both Dockerfile and ./Dockerfile, but both fail.

# The manifest for the "front-end" service.
# Read the full specification for the "Load Balanced Web Service" type at:
#  https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/

# Your service name will be used in naming your resources like log groups, ECS services, etc.
name: front-end
type: Load Balanced Web Service

# Distribute traffic to your service.
http:
  # Requests to this path will be forwarded to your service.
  # To match all requests you can use the "/" path.
  path: '/'
  # You can specify a custom health check path. The default is "/".
  # healthcheck: '/'

# Configuration for your containers and service.
image:
  location: ./Dockerfile
  # Port exposed through your container to route traffic to it.
  port: 80

cpu: 256       # Number of CPU units for the task.
memory: 512    # Amount of memory in MiB used by the task.
count: 1       # Number of tasks that should be running in your service.
exec: true     # Enable running commands in your container.

# Optional fields for more advanced use-cases.
#
#variables:                    # Pass environment variables as key value pairs.
#  LOG_LEVEL: info

#secrets:                      # Pass secrets from AWS Systems Manager (SSM) Parameter Store.
#  GITHUB_TOKEN: GITHUB_TOKEN  # The key is the name of the environment variable, the value is the name of the SSM parameter.

# You can override any of the values defined above by environment.
#environments:
#  test:
#    count: 2               # Number of tasks to run for the "test" environment.
kohidave commented 3 years ago

So sorry about this! Quick question - are you building on an M1 mac by chance? If so that's probably the reason. The image that's being built is of the ARM architecture, and Fargate is an x86 runtime.

We're working on fixing this issue here: https://github.com/aws/copilot-cli/issues/1949

In the mean time, there's a workaround, you can prefix copilot commands with the DOCKER_DEFAULT_PLATFORM=linux/amd64

For example:

DOCKER_DEFAULT_PLATFORM=linux/amd64 copilot deploy

Hope that helps, and sorry for the trouble πŸ™

jsagorin commented 3 years ago

@kohidave thanks for the suggestion. Unfortunately still running an x86 mbp (2.6 GHz 6-Core Intel Core i7) :) Would you have any other suggestions for this issue? The ks

huanjani commented 3 years ago

Hi @jsagorin! Just checking in to see if you've had any luck deploying your service. I'm stumped on what might be causing the failure, especially because it's the demo app!

jsagorin commented 3 years ago

Hi @huanjani no luck. will try run again from scratch, and let you know.

mark-brooks-roostify commented 2 years ago

I am on an ARM64 Mac and running into the same issue. Has no further progress been made? Shouldn't Copilot CLI allow for multi-platform builds?

iamhopaul123 commented 2 years ago

Hello @mark-brooks-roostify. Just want to clarify: Copilot supports multi-platform builds, so if you are using an ARM machine to build x86 image it is fine (fixed by https://github.com/aws/copilot-cli/pull/2636). But the problem is Fargate right now doesn't support ARM image, which means if you can't use ARM machine to build and deploy an ARM image. Did you build a x86 image and had the issue?

mark-brooks-roostify commented 2 years ago

I'm seeing this in the tool output but the demo still fails deployment:

Note: Your architecture type is currently unsupported. Setting platform linux/amd64 instead.

mark-brooks-roostify commented 2 years ago
mark-brooks-roostify commented 2 years ago

Sorry, it looks like it is actually AppRunner failing:

10-21-2021 02:13:32 PM [AppRunner] Health check on port '80' failed. Service is rolling back. Check your configured port number. For more information, read the application logs. 10-21-2021 02:07:40 PM [AppRunner] Performing health check on port '80'. 10-21-2021 02:07:30 PM [AppRunner] Provisioning instances and deploying image. 10-21-2021 02:07:20 PM [AppRunner] Successfully pulled image from ECR. 10-21-2021 02:04:58 PM [AppRunner] Service status is set to OPERATION_IN_PROGRESS. 10-21-2021 02:04:57 PM [AppRunner] Service creation started.

I might create a new ticket.

iamhopaul123 commented 2 years ago

Note: Your architecture type is currently unsupported. Setting platform linux/amd64 instead.

This actually indicates you are trying to use an ARM machine to build an ARM image which is not supported right now. And if you set the platform in the manifest to linux/amd64, Copilot will try to build a linux/amd64 image instead of an ARM image.

iamhopaul123 commented 2 years ago

Sorry, it looks like it is actually AppRunner failing: 10-21-2021 02:13:32 PM [AppRunner] Health check on port '80' failed. Service is rolling back. Check your configured port number. For more information, read the application logs. 10-21-2021 02:07:40 PM [AppRunner] Performing health check on port '80'. 10-21-2021 02:07:30 PM [AppRunner] Provisioning instances and deploying image. 10-21-2021 02:07:20 PM [AppRunner] Successfully pulled image from ECR. 10-21-2021 02:04:58 PM [AppRunner] Service status is set to OPERATION_IN_PROGRESS. 10-21-2021 02:04:57 PM [AppRunner] Service creation started.

Yeah if you are using a linux/amd64 it should be fine! it seems like it is an issue for the local app itself. Feel free to cut any issue!

mark-brooks-roostify commented 2 years ago

Hmm. So when I did:

DOCKER_DEFAULT_PLATFORM=linux/amd64 copilot init

It worked.

huanjani commented 2 years ago

@mark-brooks-roostify πŸ‘‹πŸΌ Just wondering which version of Copilot you're using.... This was fixed with v1.11.0.

mark-brooks-roostify commented 2 years ago

@.*** hello-app-runner-nodejs % copilot --version

copilot version: v1.11.0

@huanjani ^ ^

huanjani commented 2 years ago

Thanks, @mark-brooks-roostify! I think we've found the bug that was causing this issue for App Runner workloads on ARM architectures. Thanks for bringing this to our attention. A fix will be in the next release!

mark-brooks-roostify commented 2 years ago

That's great.

On Mon, Oct 25, 2021 at 12:40 PM Janice Huang @.***> wrote:

Thanks, @mark-brooks-roostify https://github.com/mark-brooks-roostify! I think we've found the bug that was causing this issue for App Runner workloads on ARM architectures. Thanks for bringing this to our attention. A fix will be in the next release!

efekarakus commented 2 years ago

Hi folks! Thanks for all the feedback, the fix is now out: https://github.com/aws/copilot-cli/releases/tag/v1.12.0! πŸŽ‰

jparksecurity commented 2 years ago

@jsagorin hey, I had the same issue. I was able to fix it by installing docker desktop and deleting and re-initing the app.

Just make sure you have docker cli available when running copilot init

salihgueler commented 2 years ago

When I run the copilot init on an ARM Machine, I still see the following message for Request-Driven Web Service and a derivative of this message for Load Balanced Web Service on version 1.20.0. Might there be a regression?

Note: Architecture type arm64 has been detected. At this time, arm64 architectures are not supported for App Runner workloads. We will set platform 'linux/x86_64' instead.
paragbhingre commented 2 years ago

Hello @salihgueler, The Note that you are seeing is an intended one as we do build linux/x86_64 image on ARM machines for App Runner workloads. May I know what do you mean when you say Might there be a regression?

salihgueler commented 2 years ago

Hey @paragbhingre thanks for the answer. What I mean is I see the similar message for Load Balanced Web Service as well. Even though it detects I am on arm64, it still adds the x86_64 to the yml file. Since this seems to be resolved, I thought there was a regression that bringing this issue back.

paragbhingre commented 2 years ago

Hello @salihgueler, as per the docs here we always specify image as linux/x86_64 unless the user specifies that they want the image built on ARM using the platform flag. Please let us know if you have any more questions.

salihgueler commented 2 years ago

Yes I can change that and I already did it. But the problem is, we do not have a way to set it before everything starts. E.g. If I want to deploy right after init, I can not do it right now without interfering with manifest in the middle of the process. Because, manifest.yml is generated after I run the copilot init right? Why don't we give the platform with init or better, if we can detect the platform, why not put it to manifest.yml then?

paragbhingre commented 2 years ago

@salihgueler we understand that you want to set the platform at the time of copilot init itself so that you would not need to change the manifest. But copilot init has been designed such a way that customer has to do minimal things to get their sample/example app working.

Also, previously, before we supported ARM, we would default to x86 for ARM users. If we were to suddenly build ARM once it was supported, it would be unexpected/backwards-incompatible with our previous behavior. That was another reason why we decided specifically to ask customers about the platform. I hope this helps.