GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.95k stars 1.45k forks source link

SEGV when pushing layer from Google Cloud Build -> Artifact Registry #1604

Open devjgm opened 3 years ago

devjgm commented 3 years ago

Actual behavior

...
Step #0: INFO[0148] Pushing layer us-central1-docker.pkg.dev/jgm-cloud-cxx/google-cloud-cpp-cloudbuild-docker/fedora-image/cache:c9a94ae6a3449e0c2d330d44632921f139be607646f008d7542890696f91e26f to cache now
Step #0: fatal error: unexpected signal during runtime execution
Step #0: [signal SIGSEGV: segmentation violation code=0x1 addr=0xe5 pc=0x7fd759f3dcb4]
Step #0:
Step #0: runtime stack:
Step #0: runtime.throw(0x7e0c4a, 0x2a)
Step #0:        /usr/local/go/src/runtime/panic.go:1116 +0x72
Step #0: runtime.sigpanic()
Step #0:        /usr/local/go/src/runtime/signal_unix.go:726 +0x4ac
Step #0:
Step #0: goroutine 1 [syscall]:
Step #0: runtime.cgocall(0x6a88c0, 0xc000085a30, 0xc000010038)
Step #0:        /usr/local/go/src/runtime/cgocall.go:133 +0x5b fp=0xc000085a00 sp=0xc0000859c8 pc=0x40563b
Step #0: os/user._Cfunc_mygetpwuid_r(0x0, 0xc0000b2ed0, 0xf2f070, 0x400, 0xc000010038, 0x7fd700000000)
Step #0:        _cgo_gotypes.go:175 +0x4d fp=0xc000085a30 sp=0xc000085a00 pc=0x68af8d
Step #0: os/user.lookupUnixUid.func1.1(0x0, 0xc0000b2ed0, 0xc000088dd0, 0xc000010038, 0xc000085ad0)
Step #0:        /usr/local/go/src/os/user/cgo_lookup_unix.go:103 +0xd0 fp=0xc000085a80 sp=0xc000085a30 pc=0x68bd30
Step #0: os/user.lookupUnixUid.func1(0x7987e0)
Step #0:        /usr/local/go/src/os/user/cgo_lookup_unix.go:103 +0x45 fp=0xc000085ab8 sp=0xc000085a80 pc=0x68bda5
Step #0: os/user.retryWithBuffer(0xc000088dd0, 0xc000085b90, 0x7fd75c9a1c00, 0x20300000000000)
Step #0:        /usr/local/go/src/os/user/cgo_lookup_unix.go:247 +0x3e fp=0xc000085b10 sp=0xc000085ab8 pc=0x68babe
Step #0: os/user.lookupUnixUid(0x0, 0x0, 0x0, 0x0)
Step #0:        /usr/local/go/src/os/user/cgo_lookup_unix.go:96 +0x132 fp=0xc000085bd8 sp=0xc000085b10 pc=0x68b3d2
Step #0: os/user.current(0xc000085c58, 0x4d40bc, 0xc0000b4ee0)
Step #0:        /usr/local/go/src/os/user/cgo_lookup_unix.go:49 +0x49 fp=0xc000085c18 sp=0xc000085bd8 pc=0x68b269
Step #0: os/user.Current.func1()
Step #0:        /usr/local/go/src/os/user/lookup.go:15 +0x25 fp=0xc000085c40 sp=0xc000085c18 pc=0x68bbe5
Step #0: sync.(*Once).doSlow(0xa09c40, 0x7ecf20)
Step #0:        /usr/local/go/src/sync/once.go:66 +0xec fp=0xc000085c90 sp=0xc000085c40 pc=0x474c8c
Step #0: sync.(*Once).Do(...)
Step #0:        /usr/local/go/src/sync/once.go:57
Step #0: os/user.Current(0x47705b, 0xa41360, 0xc0000b25d0)
Step #0:        /usr/local/go/src/os/user/lookup.go:15 +0x105 fp=0xc000085cc0 sp=0xc000085c90 pc=0x68ade5
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/util.unixHomeDir(0x4d4700, 0xc0000b4ee0)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/util/util.go:42 +0x25 fp=0xc000085cf0 sp=0xc000085cc0 pc=0x68e585
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/util.SdkConfigPath(0x0, 0x0, 0x0, 0x0)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/util/util.go:34 +0x26 fp=0xc000085d58 sp=0xc000085cf0 pc=0x68e466
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/store.dockerCredentialPath(0x7fd7834cb108, 0xc000088dc0, 0x7fd7834d4328, 0x18)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/store/store.go:215 +0x6d fp=0xc000085dc8 sp=0xc000085d58 pc=0x69186d
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/store.DefaultGCRCredStore(...)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/store/store.go:84
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/cli.(*helperCmd).Execute(0xc00000e360, 0x8392e0, 0xc000016018, 0xc00009e3c0, 0x0, 0x0, 0x0, 0x0)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/cli/dockerHelper.go:35 +0x35 fp=0xc000085e78 sp=0xc000085dc8 pc=0x6a6915
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/vendor/github.com/google/subcommands.(*Commander).Execute(0xc000012100, 0x8392e0, 0xc000016018, 0x0, 0x0, 0x0, 0x39)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/vendor/github.com/google/subcommands/subcommands.go:209 +0x30d fp=0xc000085f20 sp=0xc000085e78 pc=0x69252d
Step #0: github.com/GoogleCloudPlatform/docker-credential-gcr/vendor/github.com/google/subcommands.Execute(...)
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/vendor/github.com/google/subcommands/subcommands.go:492
Step #0: main.main()
Step #0:        /go/src/github.com/GoogleCloudPlatform/docker-credential-gcr/main.go:54 +0x63f fp=0xc000085f88 sp=0xc000085f20 pc=0x6a85bf
Step #0: runtime.main()
Step #0:        /usr/local/go/src/runtime/proc.go:204 +0x209 fp=0xc000085fe0 sp=0xc000085f88 pc=0x439c89
Step #0: runtime.goexit()
Step #0:        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x46b841
Step #0: Collecting git+git://github.com/googleapis/python-storage@8cf6c62a96ba3fff7e5028d931231e28e5029f1c
Step #0:   Cloning git://github.com/googleapis/python-storage (to revision 8cf6c62a96ba3fff7e5028d931231e28e5029f1c) to /tmp/pip-req-build-2spex1i1
Step #0:   Running command git clone -q git://github.com/googleapis/python-storage /tmp/pip-req-build-2spex1i1
CANCELLED
ERROR: context canceled

The above error happens when after Kaniko builds a docker layer and tries to push it from Cloud Build to artifact registry.

cloudbuild.yaml

options:
  machineType: 'N1_HIGHCPU_32'
  diskSizeGb: '512'

substitutions:
  _DISTRO: "unknown"
  _BUILD_NAME: "unknown"

timeout: 3600s

steps:
- name: 'gcr.io/kaniko-project/executor:latest'
  args: [
    '--context=dir:///workspace/ci',
    '--dockerfile=ci/cloudbuild/Dockerfile.${_DISTRO}',
    '--cache=true',
    '--destination=us-central1-docker.pkg.dev/$PROJECT_ID/google-cloud-cpp-cloudbuild-docker/${_DISTRO}-image:tag1',
  ]

- name: 'us-central1-docker.pkg.dev/$PROJECT_ID/google-cloud-cpp-cloudbuild-docker/${_DISTRO}-image:tag1'
  entrypoint: 'ci/cloudbuild/build.sh'
  args: [ '${_BUILD_NAME}' ]

The Dockerfile I'm building is here: https://github.com/googleapis/google-cloud-cpp/compare/master...devjgm:cloud-build?expand=1#diff-c1691ed788ae6246565bad5ac37a26da8a3ee735f4c2e8f07b5b205ad47b4f26

The artifact registry repo exists:

$ gcloud artifacts repositories list
Listing items under project jgm-cloud-cxx, across all locations.

                                                               ARTIFACT_REGISTRY
REPOSITORY                          FORMAT  DESCRIPTION        LOCATION     LABELS  ENCRYPTION          CREATE_TIME          UPDATE_TIME
google-cloud-cpp-cloudbuild-docker  DOCKER  Docker repository  us-central1          Google-managed key  2021-03-20T10:28:01  2021-03-20T10:28:01

Expected behavior

I believe I correctly followed the instructions at https://cloud.google.com/build/docs/kaniko-cache and I expected Kaniko to successfully upload the layers and final image to artifact registry, but instead it crashes.

To Reproduce

If this is not a known bug that I'm hitting, I can try to distill the repro steps to something smaller.

Additional Information

devjgm commented 3 years ago

Note: I changed to use gcr.io/kaniko-project/executor:edge instead of :latest, AND I changed my images to use gcr.io instead of us-central1-docker.pkg.dev and it works now.

devjgm commented 3 years ago

This problem is still happening. I get the above stack trace when I run with gcr.io/kaniko-project/executor:v1.6.0

BUT, it works when I use gcr.io/kaniko-project/executor:v1.6.0-debug

devjgm commented 3 years ago

Naive guess at the problem

My guess is that the problem is w/ the https://github.com/GoogleCloudPlatform/docker-credential-gcr/ library. The stack trace above seems to indicate this.

And the v1.6.0 and v1.6.0-debug images are using different versions of the docker-credential-gcr helper.

BROKEN https://github.com/GoogleContainerTools/kaniko/blob/7b6495426d9a4713b997a1afcf197d87eecb33a3/deploy/Dockerfile#L29-L36

WORKS (the -debug version) https://github.com/GoogleContainerTools/kaniko/blob/7b6495426d9a4713b997a1afcf197d87eecb33a3/deploy/Dockerfile_debug#L32-L34

jonjohnsonjr commented 3 years ago

tl;dr if kaniko wants to build this static binary, they need to pass more flags: https://github.com/golang/go/issues/24787#issuecomment-387611691

brightsider commented 3 years ago

I have same issue and v1.6.0-debug helps me