GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.81k stars 1.44k forks source link

Regression: Kaniko 1.7 unstable authentication against GCP Artifact Registry #1893

Open deedubs opened 2 years ago

deedubs commented 2 years ago

Actual behavior While building several containers against GCP Artifact Registry via skaffold we are getting intermittent authentication failures.

INFO[0000] Retrieving image gcr.io/kaniko-project/executor:v1.5.1@sha256:c6166717f7fe0b7da44908c986137ecfeab21f31ec3992f6e128fff8a94be8a5 from registry gcr.io 
E0124 14:27:12.856809       1 metadata.go:166] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url
INFO[0000] Built cross stage deps: map[]                
INFO[0000] Retrieving image manifest gcr.io/kaniko-project/executor:v1.5.1@sha256:c6166717f7fe0b7da44908c986137ecfeab21f31ec3992f6e128fff8a94be8a5 
INFO[0000] Returning cached image manifest              
INFO[0000] Executing 0 build triggers                   
INFO[0000] Skipping unpacking as no commands require it. 
INFO[0000] Taking snapshot of full filesystem...        
INFO[0000] Pushing image to us-east4-docker.pkg.dev/******/platform/containers/tools/kaniko:abaee2d 
INFO[0001] Pushed image to 1 destinations               
Building [bases/alpine]...
E0124 14:27:20.443958       1 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "us-east4-docker.pkg.dev/******/platform/containers/bases/alpine:abaee2d": creating push check transport for us-east4-docker.pkg.dev failed: GET https://us-east4-docker.pkg.dev/v2/token?scope=repository%3A******%2Fplatform%2Fcontainers%2Fbases%2Falpine%3Apush%2Cpull&service=us-east4-docker.pkg.dev: UNAUTHORIZED: authentication failed

Prior to invoking skaffold we issue:

docker-credential-gcr configure-docker --registries=us-east4-docker.pkg.dev

Expected behavior We expect pushes continue to work throughout the whole build.

Additional Information

imjasonh commented 2 years ago

There have been some bugs with v1.7.0 related to auth, specifically against GCR, that caused us to roll back :latest to point at v1.6.0.

I believe these issues are fixed at head. Until v1.8.0 is out (#1871), could you try your build with the latest commit-tagged image, built from a7425d1fd0442b58dc24698285102176365a28d9, and let me know if that works for you?

gcr.io/kaniko-project/executor:a7425d1fd0442b58dc24698285102176365a28d9@sha256:939b0a1a0aaad97a06db665291ac2270a9abe538af4198000046f743d1e61cd4

If it does, then when v1.8.0 is released you should get the fix (and until then you can use the commit-tagged image)

If not, please let me know so we can find and fix the issue.

deedubs commented 2 years ago

Confirmed that our pipelines can build against artifact registry using a7425d1

We'll continue to use the commit tagged image, thanks so much for the quick response!

o/t while bisecting my way from 1.6 to 1.7, I noticed the GCR helpers, is it even necessary to call docker-credential-gcr manually as a pre-step?

imjasonh commented 2 years ago

I don't think it should be necessary* -- in v1.6.0 and v1.7.0 it was initialized in the Dockerfile (setting up /kaniko/.config/gcloud/docker_credential_gcr_config.json which the helper uses), and at head the cred helpers aren't technically needed in the image since the same logic is embedded in kaniko itself -- it looks for creds available in the environment and will use those even if the cred helpers aren't available or initialized.

So in all cases it should be okay to omit any cred helper initialization pre-step, as far as I know.

*if you test and find out that it is necessary, please let me know!

deedubs commented 2 years ago

When I dropped the call, and installation of docker-credential-gcr, I get

time="2022-01-25T20:57:16Z" level=error msg="No matching credentials were found for \"us-east4-docker.pkg.dev\""
time="2022-01-25T20:57:16Z" level=error msg="No matching credentials were found for \"us-east4-docker.pkg.dev\""
time="2022-01-25T20:57:16Z" level=error msg="No matching credentials were found for \"us-east4-docker.pkg.dev\""
time="2022-01-25T20:57:16Z" level=fatal msg="deleting pod: context canceled" subtask=tools/skaffold task=Buil

Note this is being invoked via tekton

steps:
    - name: skaffold-build
      image: gcr.io/k8s-skaffold/skaffold:v1.35.1@sha256:edd5fefb172bb60396fed6b83868cfec38be8083e81b3c1aa8d3ec5cac66c09f
      workingDir: $(workspaces.source.path)
      script: |
        skaffold build \
          --default-repo=us-east4-docker.pkg.dev/$(params.DEFAULT_REPO) \
          --output="{{range \$index, \$artifact := .Builds}}{{if \$index}},{{end}}{{\$artifact.Tag}}{{end}}" \
          --file-output=/tekton/results/IMAGES
imjasonh commented 2 years ago

Well that's a little surprising to me. 🤔

It works with a step to initialize the cred helper? What's that look like?

And this is with the kaniko executor @main? Or v1.6.0 or 1.7.0?

deedubs commented 2 years ago

When we run

docker-credential-gcr configure-docker --registries=us-east4-docker.pkg.dev

And then the skaffold build invocation it works.

This is with the commit tagged version

nmousouros commented 2 years ago

I thought dockerhub had the issue but apparently, I had authentication issues with :latest tag, I didn't realize that you rollback to 1.6 so I thought dockerhub had the issue but now with the new release of 1.8 we still get the authentication error.

imjasonh commented 2 years ago

I thought dockerhub had the issue but apparently, I had authentication issues with :latest tag, I didn't realize that you rollback to 1.6 so I thought dockerhub had the issue but now with the new release of 1.8 we still get the authentication error.

The original issue seemed to be reporting issues authenticating with GCR/AR, not Dockerhub. Are you saying you also have issues with Dockerhub now?

In any case, especially where auth is involved, it's useful to tell whether you can successfully authorize a push to your registry using docker push or another similar tool. If that works and Kaniko doesn't, it's a bug in Kaniko.

nmousouros commented 2 years ago

Yes we do use dockerhub sorry for not saying this clearly.

I cannot say with confidence that this is a bug with kaniko. It happened randomly about the time 1.7 was released then fixed it self which I now think was tagging 1.6 with latest again. We thought it was something with dockerhub. We had it again when 1.8 was released yesterday, most of our pushes are failing but not all and yes we can push with docker push. I know I am vague, If there is more information I could give to help please let me know.

BarthV commented 2 years ago

I've hit the same issue here.

using executor:v1.8.0-debug :

error checking push permissions --
make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for
   "europe-west1-docker.pkg.dev/foo/bar/mayapp:f64ca23c": creating push check transport for europe-west1-docker.pkg.dev failed:
   GET https://europe-west1-docker.pkg.dev/v2/token?scope=repository%3Afoo%2Fbar%2Fmyapp%3Apush%2Cpull&service=europe-west1-docker.pkg.dev:
   UNAUTHORIZED: authentication failed

using executor:v1.7.0-debug :

WARN[0000] Skip running docker-credential-gcr as user provided docker configuration exists at /kaniko/.docker/config.json
E0310 17:12:37.509233
        18 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors

error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for
    "europe-west1-docker.pkg.dev/foo/bar/myapp:39038142": creating push check transport for europe-west1-docker.pkg.dev failed:
    GET https://europe-west1-docker.pkg.dev/v2/token?scope=repository%3Afoo%2Fbar%2Fmyapp%3Apush%2Cpull&service=europe-west1-docker.pkg.dev:
    UNAUTHORIZED: authentication failed

using executor:v1.6.0-debug :

WARN[0000] 
Skip running docker-credential-gcr as user provided docker configuration exists at /kaniko/.docker/config.json
E0310 17:14:13.558756
        18 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors
INFO[0000] Using dockerignore file: /builds/foo/bar/myapp/.dockerignore 
[...]
INFO[0013] Pushing image to europe-west1-docker.pkg.dev/foo/bar/myapp:1db4ec45 
INFO[0014] Pushing image to europe-west1-docker.pkg.dev/foo/bar/myapp:latest 
INFO[0014] Pushed image to 2 destinations

Is this somehow related ?

imjasonh commented 2 years ago

@BarthV could you do me a favor, and try this build without your ~/.docker/config.json file mounted in? There were changes since v1.7.0 to compile in the cred helper logic into the Kaniko binary that should pick up your token.json creds when pushing to *-docker.pkg.dev, but they're only checked after the Docker config JSON.

If removing that causes your push to work again, that would be great signal that the cred helper fallback is working, and would give us an option for others facing similar auth issues.

BarthV commented 2 years ago

@imjasonh

GOOGLE_APPLICATION_CREDENTIALS ENV set with token.json file path no ~/.docker/config.json at all Version Working?
v1.6.0 :heavy_check_mark:
v1.7.0 :x:
v1.8.0 :heavy_check_mark:
GOOGLE_APPLICATION_CREDENTIALS ENV set with token.json file path ~/.docker/config.json loaded with unused third-party external credentials (non-gcr, non-credHelper) Version Working?
v1.6.0 :x:
v1.7.0 :x:
v1.8.0 :heavy_check_mark:
GOOGLE_APPLICATION_CREDENTIALS ENV set with token.json file path ~/.docker/config.json loaded with gcr credHelpers for target registry Version Working?
v1.6.0 :heavy_check_mark:
v1.7.0 :x:
v1.8.0 :x:

So far, it looks good by removing config.json file. It even works when using a file with unused credentials :+1:

ionosphere80 commented 2 years ago

I can't get authentication to work with GAR using version 1.8 and any of the methods in the previous post.

beehivewarrior commented 1 year ago

I also can not get authentication to work with GAR using any of the above.

deedubs commented 1 year ago

@beehivewarrior are you using a GKE cluster? You need to ensure your cluster has the oauth scope

beehivewarrior commented 1 year ago

@deedubs Ah, that makes sense. Thanks!

jessequinn commented 1 year ago

1.12.1 is doing the same now. I had no issues with GCP GSA and private artifact registry yesterday. First time i am seeing these errors.