GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.79k stars 1.44k forks source link

Pulling from public gcr.io repositories fails when using Kaniko 1.8.0 #1984

Open jpallari opened 2 years ago

jpallari commented 2 years ago

Actual behavior After the 1.8.0 release, builds for some of my container images started to fail with an authentication error during base image pull:

$ /kaniko/executor --context ${BUILD_CONTEXT} --dockerfile ${DOCKERFILE_PATH} --destination ${CI_APPLICATION_REPOSITORY}:$CI_COMMIT_SHA
INFO[0000] Retrieving image manifest gcr.io/go-containerregistry/crane:debug 
INFO[0000] Retrieving image gcr.io/go-containerregistry/crane:debug from registry gcr.io 
error building image: GET https://gcr.io/v2/token?scope=repository%3Ago-containerregistry%2Fcrane%3Apull&service=gcr.io: UNAUTHORIZED: failed authentication

The Dockerfile uses crane as the base image, which is located in gcr.io. It's possible to pull the image without any credentials regularly, but it seems to fail when using Kaniko 1.8.0. I wasn't able to reproduce the issue in Kaniko 1.7.0, so I've decided to revert back to that for now.

Expected behavior I expected Kaniko to be able to pull the crane image and continue with the image build process.

To Reproduce Steps to reproduce the behavior:

  1. Create a Dockerfile that uses gcr.io/go-containerregistry/crane:debug as the base image.
  2. Use Kaniko 1.8.0 to build an image from the Dockerfile

Additional Information

martinezleoml commented 2 years ago

Facing the same issue here, while building image with GitLab CI & Google Container Registry.

Falling back to gcr.io/kaniko-project/executor:v1.6.0-debug solved the issue (note that downgrading to v1.7.0 didn't solve the issue)

mdeknowis commented 2 years ago

We experienced a similar issue in case of pushing

We push within OpenShift from a Tekton pipeline into the Project related docker image registry. Using a Service Account with the correct permissions

As we upgraded to 1.8.0 we get this error

error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for \"image-registry.openshift-image-registry.svc:5000/myproject/mydocker:0.1.0\": POST https://image-registry.openshift-image-registry.svc:5000/v2/myproject/mydocker/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:myproject/mydocker Type:repository] map[Action:push Class: Name:myproject/mydocker Type:repository]]

As we reverted back to 1.7.0 everything is working again.

Tamir5ht commented 2 years ago

While I've upgraded to v1.9.0-debug I get same issue when building images with GitLab CI and Google Container Registry same as @martinezleoml

going back to v1.6.0-debug solved the issue for me again.

markusheiden commented 1 year ago

Meanwhile, this issue is older than 1 year. Aren't there any ideas about what causes this problem?

For me switching to the same (public) image hosted in the Docker registry worked around the issue.

xenomachina commented 1 month ago

We're also running into this issue.

By "running in GitLab CI" I mean in GitLab's shared CI runner. I believe that's in GCP, so we're starting to wonder if this is a Google+Google+Google issue — like something is taking a shortcut when it sees that it's in GCP talking to gcr.io and ends up tripping over a bug in that special-case code.

The failures look like:

$ /kaniko/executor $EXTRA_ARGS --context=$KANIKO_CONTEXT --dockerfile=$DOCKERFILE --no-push
INFO[0000] Resolved base name golang:1.21 to build-env  
INFO[0000] Retrieving image manifest golang:1.21        
INFO[0000] Retrieving image golang:1.21 from registry index.docker.io 
INFO[0000] Retrieving image manifest gcr.io/distroless/static-debian12 
INFO[0000] Retrieving image gcr.io/distroless/static-debian12 from registry gcr.io 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fstatic-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed

The "after 0 attempts" part seems sketchy. Is that an off-by-one error, or did it really fail without even trying?

rvadim commented 1 month ago

Could be fixed by adding scope https://www.googleapis.com/auth/devstorage.read_only to the vm instance where gitlab runner is working. https://cloud.google.com/sdk/gcloud/reference/beta/compute/instances/set-scopes Seems like additional security on GCP side.

EDIT: Maybe Kaniko uses GCP SDK and it tries to login through GCP VM instance metadata and failed due to scope for gcr API is not provided

xenomachina commented 1 month ago

Could be fixed by adding scope https://www.googleapis.com/auth/devstorage.read_only to the vm instance where gitlab runner is working.

This is the shared gitlab.com runner, so I don't think we have this level of control over the vm.

mxmCherry commented 1 month ago

Confirming this is an ongoing issue with Gitlab CI / Kaniko / gcr.io distroless images.

I replicated this in https://gitlab.com/mxmCherry/kaniko-gcr-io-debug , getting the same error as reported by the issue author:

INFO[0000] Retrieving image manifest gcr.io/distroless/base-debian12:nonroot 
INFO[0000] Retrieving image gcr.io/distroless/base-debian12:nonroot from registry gcr.io 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed

I basically copied the official Gitlab recipe for image publishing, just added verbosity=trace + after_script with wget-ing the error-ed URL (it works fine).

That's done for Dockerfile with FROM gcr.io/distroless/base-debian12:nonroot.

You can see the full Gitlab CI pipelines / job logs here: https://gitlab.com/mxmCherry/kaniko-gcr-io-debug/-/pipelines


I had successful previous builds for another (work) project, and the only/main difference I noticed between successful job log vs failure is:

Last successful build (Aug 12, 2024):

Running with gitlab-runner 17.0.0~pre.88.g761ae5dd (761ae5dd)
  on green-1.saas-linux-small-amd64.runners-manager.gitlab.com/default <REDACTED>, system ID: <REDACTED>
...
$ /kaniko/executor <FLAGS_REDACTED>
ERRO[0000] Error while retrieving image from cache: gcr.io/distroless/base-debian12:nonroot unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed 
Cleaning up project directory and file based variables
Job succeeded

First failed build (Sep 20, 2024 -- this could start failing before, this is just the date WE had to touch this project):

Running with gitlab-runner 17.4.0~pre.110.g27400594 (27400594)
  on blue-6.saas-linux-small-amd64.runners-manager.gitlab.com/default <REDACTED>, system ID: <REDACTED>
...
$ /kaniko/executor <FLAGS_REDACTED>
ERRO[0000] Error while retrieving image from cache: gcr.io/distroless/base-debian12:nonroot unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1

Note the successful build being older "green" instance, and failing one being newer "blue" instance.

I also tried downgrading kaniko image for this (work) project by decrementing minor version 1.23.2 -> ... -> 1.19.0 (got way too bored to bother decrementing further). All of these versions failed with the same / very similar message.

This makes me think it could be some Gitlab upgrade issue, though I cannot prove it properly. Could also be some issue with Google transitioning GCR -> Artifact Registry -- also cannot tie it to their messages: from what I've read/googled, gcr.io should still work just fine :shrug:


Btw, we also build other (public) images using docker-in-docker approach and it seems to still be working, had last successful build on Sep 15: https://gitlab.com/bsm/docker/ffmpeg/-/pipelines/1454267223

This image is published to docker.io registry, which Kaniko pulls without any problems. So Kaniko (or the used https://github.com/google/go-containerregistry) pulls gcr.io somewhat differently than docker-in-docker or plain old wget which still succeed.

alm-pro commented 2 weeks ago

Having the same problem, not a problem when using DinD to build the image but switching to kaniko causes this failure. Same failure exists using gitlab.com shared runners or private kubernetes runner hosted in GKE.

jgsuess commented 2 weeks ago

I am also having this issue. I have tried a few different ways to authorize to artifact registry hoping this would resolve it, but no success.

mzihlmann commented 1 week ago

We did some analysis in the other thread https://github.com/GoogleContainerTools/kaniko/issues/3328#issuecomment-2415674818 and we found that is specifically this commit that started to break things https://github.com/GoogleContainerTools/kaniko/commit/633f555c5c13fc1ef08f819cf60a93d19cd44081. It's a fix for https://github.com/GoogleContainerTools/kaniko/pull/1856 Fix implicit GCR auth.

It tries to fix implicit auth in GCR and it does it well, that's the problem. When running in gitlab.com's gitlab-runner kaniko tries to get oauth token from metadata-server and succeeds! However, this token does not have permissions to pull from gcr, hence the later failure.

The token that we receive only has limited permissions, specifically these:

"scope": "https://www.googleapis.com/auth/monitoring.write https://www.googleapis.com/auth/logging.write"

gitlab is informed and they don't consider it a security risk, however they consider removing those permissions anyways.

The best workaround is to disable GCR authentication, credits go to @jameshartig:

  variables:
    GOOGLE_APPLICATION_CREDENTIALS: /dev/null

There are multiple problems coming together in my opinion: