Open jpallari opened 2 years ago
Facing the same issue here, while building image with GitLab CI & Google Container Registry.
Falling back to gcr.io/kaniko-project/executor:v1.6.0-debug
solved the issue (note that downgrading to v1.7.0 didn't solve the issue)
We experienced a similar issue in case of pushing
We push within OpenShift from a Tekton pipeline into the Project related docker image registry. Using a Service Account with the correct permissions
As we upgraded to 1.8.0 we get this error
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for \"image-registry.openshift-image-registry.svc:5000/myproject/mydocker:0.1.0\": POST https://image-registry.openshift-image-registry.svc:5000/v2/myproject/mydocker/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:myproject/mydocker Type:repository] map[Action:push Class: Name:myproject/mydocker Type:repository]]
As we reverted back to 1.7.0 everything is working again.
While I've upgraded to v1.9.0-debug I get same issue when building images with GitLab CI and Google Container Registry same as @martinezleoml
going back to v1.6.0-debug solved the issue for me again.
Meanwhile, this issue is older than 1 year. Aren't there any ideas about what causes this problem?
For me switching to the same (public) image hosted in the Docker registry worked around the issue.
We're also running into this issue.
We can pull images from gcr.io with jib running in GitLab CI.
We can pull images from docker hub with Kaniko running in GitLab CI.
We can pull images from gcr.io with Kaniko running locally.
We cannot pull images from gcr.io with Kaniko running in GitLab CI.
By "running in GitLab CI" I mean in GitLab's shared CI runner. I believe that's in GCP, so we're starting to wonder if this is a Google+Google+Google issue — like something is taking a shortcut when it sees that it's in GCP talking to gcr.io and ends up tripping over a bug in that special-case code.
The failures look like:
$ /kaniko/executor $EXTRA_ARGS --context=$KANIKO_CONTEXT --dockerfile=$DOCKERFILE --no-push
INFO[0000] Resolved base name golang:1.21 to build-env
INFO[0000] Retrieving image manifest golang:1.21
INFO[0000] Retrieving image golang:1.21 from registry index.docker.io
INFO[0000] Retrieving image manifest gcr.io/distroless/static-debian12
INFO[0000] Retrieving image gcr.io/distroless/static-debian12 from registry gcr.io
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fstatic-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
The "after 0 attempts" part seems sketchy. Is that an off-by-one error, or did it really fail without even trying?
Could be fixed by adding scope https://www.googleapis.com/auth/devstorage.read_only
to the vm instance where gitlab runner is working.
https://cloud.google.com/sdk/gcloud/reference/beta/compute/instances/set-scopes
Seems like additional security on GCP side.
EDIT: Maybe Kaniko uses GCP SDK and it tries to login through GCP VM instance metadata and failed due to scope for gcr API is not provided
Could be fixed by adding scope
https://www.googleapis.com/auth/devstorage.read_only
to the vm instance where gitlab runner is working.
This is the shared gitlab.com runner, so I don't think we have this level of control over the vm.
Confirming this is an ongoing issue with Gitlab CI / Kaniko / gcr.io distroless images.
I replicated this in https://gitlab.com/mxmCherry/kaniko-gcr-io-debug , getting the same error as reported by the issue author:
INFO[0000] Retrieving image manifest gcr.io/distroless/base-debian12:nonroot
INFO[0000] Retrieving image gcr.io/distroless/base-debian12:nonroot from registry gcr.io
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
I basically copied the official Gitlab recipe for image publishing, just added verbosity=trace + after_script with wget-ing the error-ed URL (it works fine).
That's done for Dockerfile with FROM gcr.io/distroless/base-debian12:nonroot
.
You can see the full Gitlab CI pipelines / job logs here: https://gitlab.com/mxmCherry/kaniko-gcr-io-debug/-/pipelines
I had successful previous builds for another (work) project, and the only/main difference I noticed between successful job log vs failure is:
Last successful build (Aug 12, 2024):
Running with gitlab-runner 17.0.0~pre.88.g761ae5dd (761ae5dd)
on green-1.saas-linux-small-amd64.runners-manager.gitlab.com/default <REDACTED>, system ID: <REDACTED>
...
$ /kaniko/executor <FLAGS_REDACTED>
ERRO[0000] Error while retrieving image from cache: gcr.io/distroless/base-debian12:nonroot unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
Cleaning up project directory and file based variables
Job succeeded
First failed build (Sep 20, 2024 -- this could start failing before, this is just the date WE had to touch this project):
Running with gitlab-runner 17.4.0~pre.110.g27400594 (27400594)
on blue-6.saas-linux-small-amd64.runners-manager.gitlab.com/default <REDACTED>, system ID: <REDACTED>
...
$ /kaniko/executor <FLAGS_REDACTED>
ERRO[0000] Error while retrieving image from cache: gcr.io/distroless/base-debian12:nonroot unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fbase-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1
Note the successful build being older "green" instance, and failing one being newer "blue" instance.
I also tried downgrading kaniko image for this (work) project by decrementing minor version 1.23.2 -> ... -> 1.19.0 (got way too bored to bother decrementing further). All of these versions failed with the same / very similar message.
This makes me think it could be some Gitlab upgrade issue, though I cannot prove it properly. Could also be some issue with Google transitioning GCR -> Artifact Registry -- also cannot tie it to their messages: from what I've read/googled, gcr.io should still work just fine :shrug:
Btw, we also build other (public) images using docker-in-docker approach and it seems to still be working, had last successful build on Sep 15: https://gitlab.com/bsm/docker/ffmpeg/-/pipelines/1454267223
This image is published to docker.io registry, which Kaniko pulls without any problems. So Kaniko (or the used https://github.com/google/go-containerregistry) pulls gcr.io somewhat differently than docker-in-docker or plain old wget
which still succeed.
Having the same problem, not a problem when using DinD to build the image but switching to kaniko causes this failure. Same failure exists using gitlab.com shared runners or private kubernetes runner hosted in GKE.
I am also having this issue. I have tried a few different ways to authorize to artifact registry hoping this would resolve it, but no success.
We did some analysis in the other thread https://github.com/GoogleContainerTools/kaniko/issues/3328#issuecomment-2415674818 and we found that is specifically this commit that started to break things https://github.com/GoogleContainerTools/kaniko/commit/633f555c5c13fc1ef08f819cf60a93d19cd44081. It's a fix for https://github.com/GoogleContainerTools/kaniko/pull/1856 Fix implicit GCR auth
.
It tries to fix implicit auth in GCR and it does it well, that's the problem. When running in gitlab.com's gitlab-runner kaniko tries to get oauth token from metadata-server and succeeds! However, this token does not have permissions to pull from gcr, hence the later failure.
The token that we receive only has limited permissions, specifically these:
"scope": "https://www.googleapis.com/auth/monitoring.write https://www.googleapis.com/auth/logging.write"
gitlab is informed and they don't consider it a security risk, however they consider removing those permissions anyways.
The best workaround is to disable GCR authentication, credits go to @jameshartig:
variables:
GOOGLE_APPLICATION_CREDENTIALS: /dev/null
There are multiple problems coming together in my opinion:
Actual behavior After the 1.8.0 release, builds for some of my container images started to fail with an authentication error during base image pull:
The Dockerfile uses crane as the base image, which is located in gcr.io. It's possible to pull the image without any credentials regularly, but it seems to fail when using Kaniko 1.8.0. I wasn't able to reproduce the issue in Kaniko 1.7.0, so I've decided to revert back to that for now.
Expected behavior I expected Kaniko to be able to pull the crane image and continue with the image build process.
To Reproduce Steps to reproduce the behavior:
gcr.io/go-containerregistry/crane:debug
as the base image.Additional Information
registry.gitlab.com/lepovirta/dis/kaniko@sha256:558f105d3d4fe2cfbb94a851a203d5d4e87105fdfb662e98934a0bcf5f16b892
Failed build in GitLab CI
Triage Notes for the Maintainers
--cache
flag