Panfactum / stack

The Panfactum Stack
https://panfactum.com
Other
16 stars 5 forks source link

[question]: wf_dockerfile_build push fails with unauthorized 401 #103

Closed wesbragagt closed 3 months ago

wesbragagt commented 3 months ago

Prior Search

What is your question?

When running my workloads in argo to build and push a container image using the stack module wf_dockerfile_build/ I keep getting an unauthorized error 401 when trying to push to the container registry for which the workflow is running. Where else could I look at to further resolve the issue?

Steps I've attempted without success:

Logs:

build-implentio-api-xt6zs-build-images-489452731: ------
build-implentio-api-xt6zs-build-images-489452731:  > exporting to image:
build-implentio-api-xt6zs-build-images-489452731: ------
build-implentio-api-xt6zs-build-images-489452731: error: failed to solve: failed to push 730335560480.dkr.ecr.us-west-2.amazonaws.com/implentio-api:2a4361de4e5a813f44ad1c882078fedbde196714-amd64: unexpected status from HEAD request to https://730335560480.dkr.ecr.us-west-2.amazonaws.com/v2/implentio-api/blobs/sha256:d9aac50bc34e2a0199701ebddca85c36acd90c4d1ad915ca0849364c41547d70: 401 Unauthorized
build-implentio-api-xt6zs-build-images-489452731: {"argo":true,"error":null,"level":"info","msg":"sub-process exited","time":"2024-08-10T02:55:57.785Z"}
build-implentio-api-xt6zs-build-images-489452731: {"argo":true,"level":"info","msg":"not saving outputs - not main container","time":"2024-08-10T02:55:57.785Z"}
build-implentio-api-xt6zs-build-images-489452731: Error: exit status 1

What primary components of the stack does this relate to?

terraform, reference

Code of Conduct

wesbragagt commented 3 months ago

I saw this issue in the buildkit repo https://github.com/moby/buildkit/issues/3947

wesbragagt commented 3 months ago

Additionally from the buildkit server pod from a different build that I attempted: msg="/moby.buildkit.v1.Control/Solve ret │ │ urned error: rpc error: code = Unknown desc = failed to push 730335560480.dkr.ecr.us-west- │ │ 2.amazonaws.com/reconciliation-engine:ac488ec3015f6e6a73dc8e4a3dc547c760d4078e-amd64: unex │ │ pected status from HEAD request to https://730335560480.dkr.ecr.us-west-2.amazonaws.com/v2 │ │ /reconciliation-engine/blobs/sha256:83989655d54c90de3d819ce6788e7864338ea3bbaf3d825dbbbd8f │ │ 0059745a5d: 401 Unauthorized

wesbragagt commented 3 months ago

@fullykubed could the service account associated with the iam role be bound to a role that doesn't exist anymore upon destroy and apply of the same stack? That's the only thing I can think that could be causing this since I didn't have this issue before on first time applying the CICD stack. I'm gonna try commenting out the login logic and see if I get a different error at least confirm that buildctl is indeed using the /.docker/config.json file and verify the sts caller identity.

After further investigation I've confirmed that the sts caller-identity is indeed as expected to be the iam role created. I've also confirmed that the role and serviceaccount get deleted upon destruction of the workflow. I also tried in a brand new ECR repo using the panfactum stack aws_ecr_repos and still got a 401. I'm extending the wf_dockerfile_build in order to debug the script/build.sh.

wesbragagt commented 3 months ago

Found this issue https://github.com/moby/buildkit/issues/2136 which led me to tweak the script as below, but no success 😞 still getting

build-implentio-api-2gsqp-build-images-3289723871: #22 exporting config sha256:199e2eb85164379735b791d2b85f51b3455d619c089252c80c04090d073e16d8 done
build-implentio-api-2gsqp-build-images-3289723871: #22 pushing layers 0.0s done
build-implentio-api-2gsqp-build-images-3289723871: #22 ERROR: failed to push 730335560480.dkr.ecr.us-west-2.amazonaws.com/api:2a4361de4e5a813f44ad1c882078fedbde196714-arm64: unexpected status from HEAD request to https://730335560480.dkr.ecr.us-west-2.amazonaws.com/v2/api/blobs/sha256:f4691ee707061ff9d48b30141bf04f1b16129de954e7c2c12b382f696d437224: 401 Unauthorized
build-implentio-api-2gsqp-build-images-3289723871: ------
build-implentio-api-2gsqp-build-images-3289723871:  > exporting to image:
build-implentio-api-2gsqp-build-images-3289723871: ------
build-implentio-api-2gsqp-build-images-3289723871: error: failed to solve: failed to push 730335560480.dkr.ecr.us-west-2.amazonaws.com/api:2a4361de4e5a813f44ad1c882078fedbde196714-arm64: unexpected status from HEAD request to https://730335560480.dkr.ecr.us-west-2.amazonaws.com/v2/api/blobs/sha256:f4691ee707061ff9d48b30141bf04f1b16129de954e7c2c12b382f696d437224: 401 Unauthorized
# /scripts/build.sh
#!/usr/bin/env bash

set -eo pipefail

###########################################################
## Step 1: CD to the codebase
###########################################################
cd /code/repo || exit

###########################################################
## Step 2: Set the image tag as the commit sha
###########################################################
TAG="$(git rev-parse "$GIT_REF")-$ARCH"

###########################################################
## Step 3: Get BuildKit address
###########################################################
BUILDKIT_HOST=$(pf-buildkit-get-address --arch="$ARCH")
export BUILDKIT_HOST

###########################################################
## Step 4: Get the ECR credentials
###########################################################

# DEBUG AWS permissions
aws sts get-caller-identity
printenv

echo "Generating docker config"
ECR_PASSWORD=$(aws ecr get-login-password --region "$IMAGE_REGION")
AUTH_TOKEN=$(echo "AWS:$ECR_PASSWORD" | base64 --wrap=0)
DOCKER_HUB_USER_BASE64=$(echo "$DOCKER_HUB_USER" | base64 --wrap=0)
DOCKER_HUB_PASSWORD_BASE64=$(echo "$DOCKER_HUB_PASSWORD" | base64 --wrap=0)
cat >"/.docker/config.json" <<EOF
{
    "auths": {
        "$IMAGE_REGISTRY": {
          "auth": "$AUTH_TOKEN"
        },
        "$DOCKER_HUB_REGISTRY": {
          "auth": "$DOCKER_HUB_USER_BASE64:$DOCKER_HUB_PASSWORD_BASE64"
        }
    }
}
EOF

###########################################################
## Step 5: Record the build
###########################################################
pf-buildkit-record-build --arch="$ARCH"

###########################################################
## Step 6: Build the image
###########################################################
# shellcheck disable=SC2086
buildctl \
  build \
  --frontend=dockerfile.v0 \
  --output "type=image,name=$IMAGE_REGISTRY/$IMAGE_REPO:$TAG,push=$PUSH_IMAGE" \
  --local context="$BUILD_CONTEXT" \
  --local dockerfile="$(dirname "$DOCKERFILE_PATH")" \
  --opt filename="./$(basename "$DOCKERFILE_PATH")" \
  $SECRET_ARGS \
  $BUILD_ARGS \
  --export-cache "type=s3,region=$BUILDKIT_BUCKET_REGION,bucket=$BUILDKIT_BUCKET_NAME,name=$IMAGE_REGISTRY/$IMAGE_REPO" \
  --import-cache "type=s3,region=$BUILDKIT_BUCKET_REGION,bucket=$BUILDKIT_BUCKET_NAME,name=$IMAGE_REGISTRY/$IMAGE_REPO" || tail -f /dev/null # allow shell into container for further debugging
wesbragagt commented 3 months ago

My issue was solved by passing DOCKER_CONFIG=/.docker instead DOCKER_CONFIG=/.docker/config.json. Small oversight on my end. I hope it helps someone else.

fullykubed commented 3 months ago

@wesbragagt A little confused on what ended up being the problem and solution here.

Primarily just making sure there isn't an issue with the module> Are you saying here that the module did not work out of the box and that you had to override something? Or are you saying that you overrode a default value and that caused the issue?

wesbragagt commented 3 months ago

@fullykubed the wf_dockerfile_build did not work out of the box for me so I had to copy and extend it to add the DOCKER_CONFIG env variable pointing to the /.docker folder.

fullykubed commented 3 months ago

@wesbragagt Gotchya. I believe the version in the latest release does work without the need for any modification.

Can you check?

wesbragagt commented 3 months ago

@fullykubed I haven't gotten to that push step but I created a bug issue related to another issue I had to solve by extending the module https://github.com/Panfactum/stack/issues/106.