aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.53k stars 417 forks source link

copilot job deploy/run fails with error "InternalError: failed to create container model" on ECS #5032

Closed acamb closed 1 year ago

acamb commented 1 year ago

Hi, I deployed a new version of a job with copilot job deploy that reported the following (success?) :

√ Proposing infrastructure changes for stack XXX 
- Updating the infrastructure for stack XXX  [update complete]   [87.9s]
  - An Addons CloudFormation Stack for your additional AWS resources            [update complete]   [71.8s]
  - An IAM role for a state machine to run ECS tasks in your cluster            [update complete]   [21.0s]
  - A state machine to invoke your job and handle retry and timeout logic       [update complete]   [4.1s]
  - An ECS task definition to group your containers and run them on ECS         [delete complete]   [3.2s]
  - An IAM role to control permissions for the containers in your tasks         [not started]
√ Deployed XXX.

After starting the job with copilot job run the task transition from provisioning to deprovisioning and I see the following error on the 'Task Overview' section in ECS:

InternalError: failed to create container model: failed to normalize image reference ":fae9f246". Launch a new task to retry.

A new run of copilot job deploy reports:

- No new infrastructure changes for stack XXX
X deploy job XXX to environment prod: deploy job: change set with name YYY for stack XXX has no changes

I'm unable to deploy the new version .

iamhopaul123 commented 1 year ago

Hello @acamb.

X deploy job XXX to environment prod: deploy job: change set with name YYY for stack XXX has no changes

This is because you didn't have any changes in your job manifest or code changes locally (if you use a local Dockerfile).

After starting the job with copilot job run the task transition from provisioning to deprovisioning and I see the following error on the 'Task Overview' section in ECS:

Would you mind to tell us why you wanted to do job run after doing job deploy? After running job deploy the state machine should be able to spin up ECS tasks automatically based on the schedule you set in your manifest, unless you wanted to trigger the task spinning manually.

For the error itself, it seems like the image URL was not correctly set. Which version of Copilot are you using? And would you mind sharing your job manifest? Thank you!

acamb commented 1 year ago

Hi @iamhopaul123

Would you mind to tell us why you wanted to do job run after doing job deploy? After running job deploy the state machine should be able to spin up ECS tasks automatically based on the schedule you set in your manifest, unless you wanted to trigger the task spinning manually.

Sure, i wanted to run the job immediately regardless of the scheduling.

For the error itself, it seems like the image URL was not correctly set. Which version of Copilot are you using? And would you mind sharing your job manifest? Thank you!

I'm using the last version (1.28.0), but the previous task version was deployed with an older version (maybe 1.24.0/1.26.0 ).

The manifest is the following:

# The manifest for the "XXX" job.
# Read the full specification for the "Scheduled Job" type at:
#  https://aws.github.io/copilot-cli/docs/manifest/scheduled-job/

# Your job name will be used in naming your resources like log groups, ECS Tasks, etc.
name: XXX
type: Scheduled Job

# Trigger for your task.
on:
  # The scheduled trigger for your job. You can specify a Unix cron schedule or keyword (@weekly) or a rate (@every 1h30m)
  # AWS Schedule Expressions are also accepted: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html
  schedule: "@weekly"
#retries: 3        # Optional. The number of times to retry the job before failing.
#timeout: 1h30m    # Optional. The timeout after which to stop the job if it's still running. You can use the units (h, m, s).

# Configuration for your container and task.
image:
  # Docker build arguments. For additional overrides: https://aws.github.io/copilot-cli/docs/manifest/scheduled-job/#image-build
  build: XXX\Dockerfile

cpu: 256       # Number of CPU units for the task.
memory: 512    # Amount of memory in MiB used by the task.

# Optional fields for more advanced use-cases.
#
#variables:                    # Pass environment variables as key value pairs.
#  LOG_LEVEL: info
variables:
  APP_YYY_REMOTE: YYY
#secrets:                      # Pass secrets from AWS Systems Manager (SSM) Parameter Store.
#  GITHUB_TOKEN: GITHUB_TOKEN  # The key is the name of the environment variable, the value is the name of the SSM parameter.
secrets:
  SPRING_DATASOURCE_PASSWORD:
    secretsmanager: 'MY_SCECRET:password::'
  APP_YYY_CFISC:
    secretsmanager: 'MY_SECRET:codicefiscale::'
  APP_YYY_PASSWORD:
    secretsmanager: 'MY_SECRET:password::'
  APP_YYY_PIN:
    secretsmanager: 'MY_SECRET:pin::'
# You can override any of the values defined above by environment.
#environments:
#  prod:
#    cpu: 2048               # Larger CPU value for prod environment.

Deleting the job with copilot job delete and redoing init/deploy (after manually deleting the ECR as reported in the issue #4963) doesn't trigger the problem described.

Thanks,

iamhopaul123 commented 1 year ago

Hello @acamb. I replied here and let's keep one thread for this discussion.