aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.48k stars 400 forks source link

[Bug]: when building two services from the same Dockerfile, target seems to be ignored #5921

Open rsyring opened 2 weeks ago

rsyring commented 2 weeks ago

Maybe related to: https://github.com/aws/copilot-cli/issues/1943

Description:

I have a Flask based Python application that can be ran as a web service and also as a celery worker. So, two services from the same codebase. Since the manifest's image tag takes a target setting, I assumed I could setup my LB web service and the Backend Service (celery) to share the same Dockerfile:

from ubuntu:22.04 as app

# ...snip details that don't matter #

entrypoint ["/var/venv/bin/granian", "--host", "0.0.0.0", "--port", "8000", "--interface", "wsgi", "wsgi:app"]
cmd []

from app as celery

entrypoint ["/var/venv/bin/celery", "--app", "climate.celery.worker", "--workdir", "/app", "worker", "--hostname", "celery.%h", "--autoscale=15,2"]
cmd []

Manifests:

# celery
image:
  build: Dockerfile
  target: celery

# web
image:
  build: Dockerfile
  port: 8000
  target: app

Initially, only the web manifest existed and I deployed that service successfully multiple times. I then added the celery service and the celery serivce also deployed successfully.

I went back and did some work on the web service source code and deployed it again. It started failing it's health checks. Turns out that the web service was now running the app with the celery entrypoint.

I was only able to resolve this by commenting out the celery build target in the Dockerfile so that the last target was app.

Details:

copilot: v 1.34.1 OS: Ubuntu 24.04 Docker: 27.1.2 Manifests: LB Web Service & Backend Service AWS Region: us-east-2 Deploy command: copilot svc deploy --name [celery|web]

Expectation & Result

Expectation: the web image would have the entrypoint of the app target in the Dockerfile.

Result: the web image had the entrypoint of the celery target (although, I think it wasn't that target in particular but just the last target in the Dockerfile.

Debugging:

Evidence that the image is not being built with the correct entrypoint can be found in image history in that ECR repo:

Inspecting image with digest: sha256:6f70806a90477b1838af900850ccf43f1652523b7c2598fb76e1f3e8e2310cb0
Entrypoint: ['/var/venv/bin/granian', '--host', '0.0.0.0', '--port', '8000', '--interface', 'wsgi', 'wsgi:app']

Inspecting image with digest: sha256:1a461825360dc880a77ac78498c101d6fadcb588c275a7e622cd3569dd8e6861
Entrypoint: ['/var/venv/bin/celery', '--app', 'climate.celery.worker', '--workdir', '/app', 'worker', '--hostname', 'celery.%h', '--autoscale=15,2']

Inspecting image with digest: sha256:64846cbba40a3cd0c3b227296c74ba5afd23fdfa8279db4c5070cc1de0caee19
Entrypoint: ['/var/venv/bin/celery', '--app', 'climate.celery.worker', '--workdir', '/app', 'worker', '--hostname', 'celery.%h', '--autoscale=15,2']

Inspecting image with digest: sha256:b61f01770acb625d6d12f11ff135ecfc44d61d96a7a7e22871cd0d53a63b6275
Entrypoint: ['/var/venv/bin/celery', '--app', 'climate.celery.worker', '--workdir', '/app', 'worker', '--hostname', 'celery.%h', '--autoscale=15,2']

Inspecting image with digest: sha256:0eaf29dd2a651cf7a55f74f975ddb3e2c6c29613dd4444ca91fac6e65adbbb0c
Entrypoint: ['/var/venv/bin/granian', '--host', '0.0.0.0', '--port', '8000', '--interface', 'wsgi', 'wsgi:app']

Inspecting image with digest: sha256:5dbc6c9800fc8c320be49896effd7f820a54bbfe781c3f75bd182e0189fddb99
Entrypoint: ['/var/venv/bin/granian', '--host', '0.0.0.0', '--port', '8000', '--interface', 'wsgi', 'wsgi:app']

The difference between the entrypoint configuration in the image was whether or not the Dockerfile celery build target was active or commented out.

I was also able to run the two different build targets without issue in docker compose:


  app:
    image: some-app-web:latest
    build:
      context: .
      target: app
    ports:
      - "127.0.0.1:8000:8000"

  celery:
    image: some-app-celery:latest
    build:
      context: .
      target: celery

Workaround

Use the same build target, app, which also happens to be the last target in the Dockerfile.

Set the entrypoint in the Dockerfile for the web service. In the celery manifest, change the entrypoint.

This works but is not ideal b/c now I need to specify the entrypoint for celery in both the manifest and my docker compose file. I prefer to have it all in the Dockerfile so that there isn't room for "drift" between my local testing of the images and what they will be when they are running in AWS.

Lou1415926 commented 2 weeks ago

Hi @rsyring ! Copilot passes all the relevant arguments you specify under image to docker build. Would you be able to run docker history of your app's image to see if it was built via. docker build --target celery?

I also wonder if BuildKit was enabled when the images are being built. I am not exactly sure how this would be relevant to the issue you have - I just have a gut feeling that it might 💭

Lou1415926 commented 2 weeks ago

and thanks a bunch for catching the malware bot for us! I've hidden their comment. ❤️

rsyring commented 2 weeks ago

@Lou1415926 hi there. Thanks for jumping in here. Here is the result of docker history where the image id is the latest image pushed to the web ECR repo:

 ❯ docker history 58b460adfabf
IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
58b460adfabf   14 hours ago   COPY deploy/climate-config.py /etc/climate-c…   3.9kB     buildkit.dockerfile.v0
<missing>      14 hours ago   CMD []                                          0B        buildkit.dockerfile.v0
<missing>      14 hours ago   ENTRYPOINT ["/var/venv/bin/granian" "--host"…   0B        buildkit.dockerfile.v0
<missing>      14 hours ago   COPY /app/vite/dist /app/vite/dist # buildkit   958kB     buildkit.dockerfile.v0
<missing>      14 hours ago   ENV PATH=/var/venv/bin/:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      14 hours ago   COPY src/tasks_lib.py src/tasks_lib.py # bui…   680B      buildkit.dockerfile.v0
<missing>      14 hours ago   COPY src/wsgi.py src/wsgi.py # buildkit         400B      buildkit.dockerfile.v0
<missing>      14 hours ago   RUN /bin/sh -c $UV_INSTALL -e . # buildkit      62.5kB    buildkit.dockerfile.v0
<missing>      14 hours ago   COPY src/climate src/climate # buildkit         23.2MB    buildkit.dockerfile.v0
<missing>      15 hours ago   RUN /bin/sh -c $UV_INSTALL -r requirements/d…   219MB     buildkit.dockerfile.v0
<missing>      15 hours ago   COPY pyproject.toml hatch.toml . # buildkit     1.04kB    buildkit.dockerfile.v0
<missing>      15 hours ago   COPY requirements requirements # buildkit       370kB     buildkit.dockerfile.v0
<missing>      15 hours ago   RUN /bin/sh -c $UV_INSTALL /tmp/wheels/* # b…   2.36MB    buildkit.dockerfile.v0
<missing>      15 hours ago   COPY /tmp/wheels /tmp/wheels # buildkit         716kB     buildkit.dockerfile.v0
<missing>      23 hours ago   WORKDIR /app                                    0B        buildkit.dockerfile.v0
<missing>      23 hours ago   ENV UV_INSTALL=mise exec -- uv pip install -…   0B        buildkit.dockerfile.v0
<missing>      23 hours ago   RUN /bin/sh -c mise install python     && mi…   135MB     buildkit.dockerfile.v0
<missing>      23 hours ago   COPY mise.toml deploy/mise.local.toml . # bu…   331B      buildkit.dockerfile.v0
<missing>      23 hours ago   ENV MISE_TRUSTED_CONFIG_PATHS=/app              0B        buildkit.dockerfile.v0
<missing>      23 hours ago   WORKDIR /app                                    0B        buildkit.dockerfile.v0
<missing>      23 hours ago   RUN /bin/sh -c apt update -y     && apt inst…   114MB     buildkit.dockerfile.v0
bendilts commented 2 weeks ago

I also saw this same behavior. Right now I'm working around it by having multiple mostly-identical Dockerfiles.