aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.53k stars 417 forks source link

[Bug]: Secrets Manager Secrets occasionally become undefined in task? #5881

Closed acrinklaw closed 4 months ago

acrinklaw commented 4 months ago

Description:

When I deploy my application via Copilot, sometimes the ECS tasks that are spun up will have the secrets imported from Secrets Manager as undefined. The deployment is successful and the task JSON shows the secrets as being defined, but when I access them in my application they are undefined. Strangely this does not occur every time, sometimes they will be defined and have the correct values and other times they won't. I am not sure if this is a Copilot issue or some other part of the infra stack.

Expected result:

My secrets should contain the values they are set to in.

Debugging:

I've browsed through the infrastructure it creates to make sure the secrets are being passed in and I've looked at the GH issues to make sure this hasn't been brought up before but wasn't able to find any clear reason why this occurs or if anyone else has had the same issue

Varun359 commented 4 months ago

Hi @acrinklaw,

  1. Can you please share your manifest here.
  2. Please review all TaskDefinition revisions created during ECS service updates to ensure the secrets are correctly included in the container definitions.
acrinklaw commented 4 months ago

Hi @Varun359, here is my manifest with some information removed. The strange thing is that my secrets will work initially, but some time later when we try to use the Trello API information, they become undefined. It may be unrelated to copilot but I wanted to start here to verify and work my way through the propagation.

I checked the TaskDefinition and can see them listed in the revision. These are being fetched from Secrets Manager so it may be from that? I can look into maybe using the AWS SDK instead to fetch these values


# The manifest for the "frontend" service.
# Read the full specification for the "Load Balanced Web Service" type at:
#  https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/

# Your service name will be used in naming your resources like log groups, ECS services, etc.
name: frontend
type: Load Balanced Web Service
healthcheck:
  command: ["CMD-SHELL", "curl -f http://localhost:3000/api/health_check || exit 1"]
  interval: 15s
  retries: 2
  timeout: 10s
  start_period: 3s
  grace_period: 180s

# Distribute traffic to your service.
http:
  path: '/'
  healthcheck: '/api/health_check'

# Configuration for your containers and service.
image:
  # Docker build arguments. For additional overrides: https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/#image-build
  build: frontend/Dockerfile
  # Port exposed through your container to route traffic to it.
  port: 3000

cpu: 512       # Number of CPU units for the task.
memory: 1024    # Amount of memory in MiB used by the task.
platform: linux/x86_64  # See https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/#platform
count: 1       # Number of tasks that should be running in your service.
exec: true     # Enable running commands in your container.
network:
  connect: false # Enable Service Connect for intra-environment traffic between services.

# storage:
  # readonly_fs: true       # Limit to read-only access to mounted root filesystems.

# Optional fields for more advanced use-cases.
#
#variables:                    # Pass environment variables as key value pairs.
#  LOG_LEVEL: info

sidecars:
  datadog:
    image: public.ecr.aws/datadog/agent:latest
    variables:
      ECS_FARGATE: true
      DD_SITE: datadoghq.com

logging:
  image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
  destination:  
    Name: datadog
    Host: http-intake.logs.datadoghq.com
    TLS: on
    dd_service: *****-browser
    dd_source: uvicorn
    provider: ecs
  enableMetadata: true
  configFilePath: /fluent-bit/configs/parse-json.conf

environments:
  dev:
    deployment:
      rolling: "recreate"
    logging:
      destination:
        dd_tags: project:*****,env:dev
      secretOptions:
        apikey:
          secretsmanager: 'dev/******/datadog'
    sidecars:
      datadog:
        secrets:
          DD_API_KEY:
            secretsmanager: 'dev/******/datadog'
    variables:
      ENV: dev
      NEXTAUTH_URL: https://***********.com
      BACKEND_URL: http://backend.******.********.internal
      DD_APP_ID: *********
      DD_CLIENT_TOKEN: *******
      DD_SERVICE_NAME: *****
      DD_APM_ENABLED: true
      DD_TRACE_ENABLED: true
      DD_LOGS_INJECTION: true
    http:
      alias: # The "qa" environment imported a certificate.
        - name: '*********'
          hosted_zone: ***********
    secrets:
      GOOGLE_CLIENT_SECRET:
        secretsmanager: 'dev/*****/google-auth:secret::'
      GOOGLE_CLIENT_ID:
        secretsmanager: 'dev/*****/google-auth:id::'
      NEXTAUTH_SECRET:
        secretsmanager: 'dev/*****/nextjs:secret::'
      STUDY_CALENDAR_ID:
        secretsmanager: 'dev/*****/nextjs:study_calendar_id::'
      PROCEDURE_CALENDAR_ID:
        secretsmanager: 'dev/*****/nextjs:procedure_calendar_id::'
      EXPERIMENT_CALENDAR_ID:
        secretsmanager: 'dev/*****/nextjs:experiment_calendar_id::'
      NEXT_PUBLIC_TRELLO_TOKEN:
        secretsmanager: 'dev/*****/nextjs:trello_public_token::'
      NEXT_PUBLIC_TRELLO_BUGS_TAG_ID:
        secretsmanager: 'dev/*****/nextjs:trello_bugs_tag_id::'
      NEXT_PUBLIC_TRELLO_FEATURES_TAG_ID:
        secretsmanager: 'dev/*****/nextjs:trello_features_tag_id::'
      NEXT_PUBLIC_TRELLO_CLARIFICATION_TAG_ID:
        secretsmanager: 'dev/*****/nextjs:trello_clarification_tag_id::'
      NEXT_PUBLIC_TRELLO_FEEDBACK_LIST_ID:
        secretsmanager: 'dev/*****/nextjs:trello_feedback_list_id::'
acrinklaw commented 4 months ago

I will actually close this for now because I think it seems unlikely to come from the copilot side of things, but please let me know if you have any suggestions on how to diagnose what is going wrong