buildpacks-community / kpack

Kubernetes Native Container Build Service
Apache License 2.0
944 stars 161 forks source link

Builder fails on ECR when using dockerconfigjson file #1225

Open wdonne opened 1 year ago

wdonne commented 1 year ago

Hello,

With ECR you can use AWS as the username and an authentication token as the password. You can put this in a dockerconfigjson file like this:

{
    ".dockerconfigjson": {
      "auths": {
        "https://<account>.dkr.ecr.<region>.amazonaws.com/v2/<repository>": {
          "username": "AWS",
          "password": "XXXXX ECR Authorization Token XXXXX"
        }
     }
  }
}

If you put that in a Kubernetes secret of type kubernetes.io/dockerconfigjson and attach it to the kpack service account as both a secret and an image pull secret, then the Builder that uses that service account will produce the following error:

status:
  conditions:
    - lastTransitionTime: '2023-05-26T11:53:54Z'
      message: >-
        Post
        "https://<account>.dkr.ecr.<region>.amazonaws.com/v2/<repository>/blobs/uploads/":
        EOF
      status: 'False'
      type: Ready

The logs in the kpack controller show this:

{
  "level":"error",
  "ts":"2023-05-26T11:54:00.413273068Z",
  "logger":"controller",
  "caller":"controller/controller.go:566",
  "msg":"Reconcile error",
  "commit":"79126fe-dirty",
  "knative.dev/kind":"builders.kpack.io",
  "knative.dev/traceid":"f787f9bb-f774-4dd3-a65e-8e00b519d2f3",
  "knative.dev/key":"play/weblogic-ai-builder",
  "duration":5.782489848,
  "error":"Post \"https://<account>.dkr.ecr.<region>.amazonaws.com/v2/<repository>/blobs/uploads/\": EOF",
  "stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/pkg@v0.0.0-20221005141429-8cacac2ea6d7/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20221005141429-8cacac2ea6d7/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20221005141429-8cacac2ea6d7/controller/controller.go:491"
}

The AWS policy in the role I generated the authorization token from was the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:DescribeRepositories",
                "ecr:ListTagsForResource",
                "ecr:PutImage",
                "ecr:UploadLayerPart",
                "ecr:CompleteLayerUpload",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer"
            ],
            "Resource": "arn:aws:ecr:<region>:<account>:repository/<repository>"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        }
    ]
}
wdonne commented 1 year ago

Note that for the cluster stack and the cluster store this works fine.

I have also tried an ECR policy with ecr:* as the action, meaning it can do anything with the repository, but that doesn't change anything.

wdonne commented 1 year ago

I forgot to mention that this is with version 0.10.1. I noticed that this release file uses the 0.10.1-rc.3 version of the images for the Deployment resources.

chenbh commented 1 year ago

Huh, these all look correct to me. For sanity's sake, can you check that:

  1. Is the ClusterStack and ClusterStore pointing to private ECR images or public images? If they're public images then unfortunately it doesn't tell us much about the ECR creds
  2. The ECR repository the builder is pointing to exists. ECR has an annoying policy of requiring repos to be explicitly created instead of dynamically created on pushes like dockerhub or gcr
  3. The Builder is created after the service account and secret. I'm not 100% sure but I think the controller doesn't re-reconcile Builders on service account changes

Can you also try using just the hostname in the dockerconfig? Something like:

      "auths": {
        "<account>.dkr.ecr.<region>.amazonaws.com": {
          "username": "AWS",
          "password": "XXXXX ECR Authorization Token XXXXX"
        }
     }
semmet95 commented 1 year ago

Hey @wdonne We are facing similar issues with ECR put permissions too. Did you find any workaround?

wdonne commented 1 year ago

Hi @semmet95 ,

I haven't pursued this further yet, but the only possible thing I see is using the domain name instead of the URL in the dockerconfig.

semmet95 commented 1 year ago

@wdonne For me your approach worked when I created a secret using .docker/config json file after logging in to ecr with the IAM role with proper policies.

kubectl create secret generic regcred --from-file=.dockerconfigjson=/Users/amisingh/temp/.docker/config.json --type=kubernetes.io/dockerconfigjson