Closed jamespollard8 closed 4 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/174833518
The labels on this github issue will be updated when the story is started.
Copying over some comments we'd written up on the kpack issue:
@davewalter said: We tried deploying on Kind with K8s v1.16.9 on GCP and the same version of cf-for-k8s. When we pushed our test node app to the platform, we were initially surprised to see that the digest for the cf-default-builder image was identical to the one created with K8s v1.18.6, but on reflection, this seems to make sense, given that the builder/stack and store definitions are all identical.
We next tried removing all of the buildpacks, with the exception of the Node JS buildpack that is required for our sample app. For reference, here is the store/stack/builder definitions we deployed:
#@ load("@ytt:data", "data")
---
apiVersion: experimental.kpack.pivotal.io/v1alpha1
kind: Store
metadata:
name: cf-buildpack-store
spec:
sources:
- image: gcr.io/paketo-buildpacks/nodejs@sha256:7110ff41a35ec4d8a0fbb63e7b292c2edc7ef0e072e542cd0a58e5d179ce2605
---
apiVersion: experimental.kpack.pivotal.io/v1alpha1
kind: Stack
metadata:
name: cflinuxfs3-stack
spec:
id: "org.cloudfoundry.stacks.cflinuxfs3"
buildImage:
image: "gcr.io/paketo-buildpacks/build@sha256:84f7b60192e69036cb363b2fc7d9834cff69dcbcf7aaf8c058d986fdee6941c3"
runImage:
image: "gcr.io/paketo-buildpacks/run@sha256:84f7b60192e69036cb363b2fc7d9834cff69dcbcf7aaf8c058d986fdee6941c3"
---
apiVersion: experimental.kpack.pivotal.io/v1alpha1
kind: CustomBuilder
metadata:
name: cf-default-builder
namespace: #@ data.values.staging_namespace
spec:
tag: #@ "{}/cf-default-builder".format(data.values.app_registry.repository_prefix)
serviceAccount: cc-kpack-registry-service-account
stack: cflinuxfs3-stack
store: cf-buildpack-store
order:
- group:
- id: paketo-buildpacks/nodejs
This did result in a new builder image, which had much smaller annotations:
[
{
...
"Config": {
...
"Labels": {
"io.buildpacks.builder.metadata": "{\"description\":\"Custom Builder built with kpack\",\"stack\":{\"runImage\":{\"image\":\"gcr.io/paketo-buildpacks/run@sha256:84f7b60192e69036cb363b2fc7d9834cff69dcbcf7aaf8c058d986fdee6941c3\",\"mirrors\":null}},\"lifecycle\":{\"version\":\"0.8.1\",\"api\":{\"buildpack\":\"0.2\",\"platform\":\"0.3\"}},\"createdBy\":{\"name\":\"kpack CustomBuilder\",\"version\":\"v0.0.10 (git sha: 68925eaca94becfeef006c413ebac4fde559e66c)\"},\"buildpacks\":[{\"id\":\"paketo-buildpacks/node-engine\",\"version\":\"0.0.260\",\"homepage\":\"https://github.com/paketo-buildpacks/node-engine\"},{\"id\":\"paketo-buildpacks/yarn-install\",\"version\":\"0.1.86\",\"homepage\":\"https://github.com/paketo-buildpacks/yarn-install\"},{\"id\":\"paketo-buildpacks/npm\",\"version\":\"0.1.79\",\"homepage\":\"https://github.com/paketo-buildpacks/npm\"},{\"id\":\"paketo-buildpacks/nodejs\",\"version\":\"0.0.5\",\"homepage\":\"https://github.com/paketo-buildpacks/nodejs\"}]}",
"io.buildpacks.buildpack.layers": "{\"paketo-buildpacks/node-engine\":{\"0.0.260\":{\"api\":\"0.2\",\"layerDiffID\":\"sha256:7d57d604f4efbf533639810fc3d590b2c334382097f4cdd4c3d5bcfaf8b1bd15\",\"stacks\":[{\"id\":\"io.buildpacks.stacks.bionic\"},{\"id\":\"org.cloudfoundry.stacks.cflinuxfs3\"}],\"homepage\":\"https://github.com/paketo-buildpacks/node-engine\"}},\"paketo-buildpacks/nodejs\":{\"0.0.5\":{\"api\":\"0.2\",\"layerDiffID\":\"sha256:71dd90c6ed436af5fd5027789c76ef7bc19d71ea2b46ec602dbdbe0c6e7ee9af\",\"order\":[{\"group\":[{\"id\":\"paketo-buildpacks/node-engine\",\"version\":\"0.0.260\"},{\"id\":\"paketo-buildpacks/yarn-install\",\"version\":\"0.1.86\"}]},{\"group\":[{\"id\":\"paketo-buildpacks/node-engine\",\"version\":\"0.0.260\"},{\"id\":\"paketo-buildpacks/npm\",\"version\":\"0.1.79\"}]}],\"homepage\":\"https://github.com/paketo-buildpacks/nodejs\"}},\"paketo-buildpacks/npm\":{\"0.1.79\":{\"api\":\"0.2\",\"layerDiffID\":\"sha256:687909d9abefc647892cbe213a425cf37097f5cd96eb79987f5fab4eb8b6af18\",\"stacks\":[{\"id\":\"org.cloudfoundry.stacks.cflinuxfs3\"},{\"id\":\"io.buildpacks.stacks.bionic\"}],\"homepage\":\"https://github.com/paketo-buildpacks/npm\"}},\"paketo-buildpacks/yarn-install\":{\"0.1.86\":{\"api\":\"0.2\",\"layerDiffID\":\"sha256:25eefd03f95bf374fa44cadd0fc7a9c7ef943ba1a7faa0fc90379786e0c07b21\",\"stacks\":[{\"id\":\"org.cloudfoundry.stacks.cflinuxfs3\"},{\"id\":\"io.buildpacks.stacks.bionic\"}],\"homepage\":\"https://github.com/paketo-buildpacks/yarn-install\"}}}",
"io.buildpacks.buildpack.order": "[{\"group\":[{\"id\":\"paketo-buildpacks/nodejs\",\"version\":\"0.0.5\"}]}]",
"io.buildpacks.stack.id": "org.cloudfoundry.stacks.cflinuxfs3"
}
},
"Architecture": "amd64",
"Os": "linux",
"Size": 1085920101,
"VirtualSize": 1085920101,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/ed68aeaecb6b741619bb19701aaa44de0ffec69b0c9ae506925df5885fcf89db/diff:/var/lib/docker/overlay2/c0d8b46fab8cc1bd50fafb8c488bff78dfded88e7fdac52fbf3d03a6f61dc5bd/diff:/var/lib/docker/overlay2/c055453ec8328a06aea7e48fa15532749c111b36b9de976263bf51ca351451d2/diff:/var/lib/docker/overlay2/edd6efe0267833bbce1f84157c7ec0cac1cdac63e51a6d825793c84d6bd3d328/diff:/var/lib/docker/overlay2/14b725c618f9984cdbd685ca3b475ee39cad8598a5dd246411bb1db5d97ddad3/diff:/var/lib/docker/overlay2/d1999b90d1a3b57d9c63e0167e1602ed090ec4e05d9e3642fbc8961a67be3a35/diff:/var/lib/docker/overlay2/4616983ec5476539812eab442d439f404364f9c8e70c7278f9958542159ea939/diff:/var/lib/docker/overlay2/1cc6f315cab22e27897441b185a7d637764766565c7f719c702a9c780b3cc18d/diff",
"MergedDir": "/var/lib/docker/overlay2/36d812a5c093893dfc5308adb9433af8d977568e342c5f4b509b7a6a77008f39/merged",
"UpperDir": "/var/lib/docker/overlay2/36d812a5c093893dfc5308adb9433af8d977568e342c5f4b509b7a6a77008f39/diff",
"WorkDir": "/var/lib/docker/overlay2/36d812a5c093893dfc5308adb9433af8d977568e342c5f4b509b7a6a77008f39/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:f0dcba2cbeedc3702b4f390963e697b06360502302ec52e0d63bcfa7235220d1",
"sha256:7755b972f0b4f49de73ef5114fb3ba9c69d80f217e80da99f56f0d0a5dcb3d70",
"sha256:c33cb7212a62bb159674eae50b7ace60bb7c73c70ea6f8597c72fff10189d78f",
"sha256:7d57d604f4efbf533639810fc3d590b2c334382097f4cdd4c3d5bcfaf8b1bd15",
"sha256:25eefd03f95bf374fa44cadd0fc7a9c7ef943ba1a7faa0fc90379786e0c07b21",
"sha256:687909d9abefc647892cbe213a425cf37097f5cd96eb79987f5fab4eb8b6af18",
"sha256:71dd90c6ed436af5fd5027789c76ef7bc19d71ea2b46ec602dbdbe0c6e7ee9af",
"sha256:3ae93fd59e3e53f9d0cbea42625e12139a7fb8a59915dc23b0219a01a44e018f",
"sha256:a7ce12a420636d1b1726e67e67055a3ead31453018be5c32019420cb9e3786ba"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
We're not sure how this information helps, given that we are still unsure as to exactly which part of the image containerd is complaining about.
and We were able to isolate the problem to Kubernetes v1.18.x by pushing the cf-default-builder image to our public dockerhub repository and creating a simple deployment:
kubectl create deployment test --image relintdockerhubpushbot/cf-default-builder-test
When we inspect the resulting pod, we see the following events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13s default-scheduler Successfully assigned default/test-5dd6b896f8-6g459 to kind-control-plane
Normal Pulling 12s kubelet, kind-control-plane Pulling image "relintdockerhubpushbot/cf-default-builder-test"
Warning Failed 11s kubelet, kind-control-plane Failed to pull image "relintdockerhubpushbot/cf-default-builder-test": rpc error: code = InvalidArgument desc = failed to pull and unpack image "docker.io/relintdockerhubpushbot/cf-default-builder-test:latest": failed to prepare extraction snapshot "extract-345901367-cGhX sha256:ac0538e4e603b4d6027dabc72660256eafd203fac1be9e1ca15c7f3f2ce837d5": info.Labels: label key and value greater than maximum size (4096 bytes), key: containerd: invalid argument
Warning Failed 11s kubelet, kind-control-plane Error: ErrImagePull
Normal BackOff 11s kubelet, kind-control-plane Back-off pulling image "relintdockerhubpushbot/cf-default-builder-test"
Warning Failed 11s kubelet, kind-control-plane Error: ImagePullBackOff
We confirmed that downgrading our Kind cluster to Kubernetes version v1.17.5 allowed the test deployment pod to successfully pull the same image:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m42s default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 2m37s default-scheduler Successfully assigned default/test-67f7dd9596-fxbhm to kind-control-plane
Warning Failed 2m9s kubelet, kind-control-plane Error: failed to generate container "7d899e03e56ea344029e64e6f265ae48e0f880ed2f3a4d5dbfd59d7dd09cf9df" spec: no command specified
Warning Failed 2m8s kubelet, kind-control-plane Error: failed to generate container "6260f8d08c1b56ba0987dbe812b877a1d3f864ee325de4a3a249bea159ce8ddb" spec: no command specified
Warning Failed 102s kubelet, kind-control-plane Error: failed to generate container "18619d20a98400c35c8f44b6aab0c1da02b2eb9b9f8f114812867a019e771e0d" spec: no command specified
Warning Failed 87s kubelet, kind-control-plane Error: failed to generate container "9704bd6d0e6afb70d6c8acfb0c9fe01a51df712d279ce7c081a50bb65ac0485a" spec: no command specified
Warning Failed 72s kubelet, kind-control-plane Error: failed to generate container "72a321264c7fb3e5025bf20ed80a559c4a4c84dca4544f41ee37990c2b99e843" spec: no command specified
Warning Failed 56s kubelet, kind-control-plane Error: failed to generate container "ca8c53eadd200ac9d31809ff0295608a550cbb1dc01b772ec112dc28f39c4185" spec: no command specified
Warning Failed 44s kubelet, kind-control-plane Error: failed to generate container "2519e695108f2a8e8744083375c417c38952e8ff8346fb23af0c91be1564ebbf" spec: no command specified
Normal Pulled 32s (x8 over 2m9s) kubelet, kind-control-plane Successfully pulled image "relintdockerhubpushbot/cf-default-builder-test"
Warning Failed 32s kubelet, kind-control-plane Error: failed to generate container "07ec32b8539949e2837f6f16dc614fba33491fed70e9c944922c3ab558836805" spec: no command specified
Normal Pulling 19s (x9 over 2m36s) kubelet, kind-control-plane Pulling image "relintdockerhubpushbot/cf-default-builder-test"
Given that this is with the same version of Kind (and therefore, we assume, containerd), our current working hypothesis is that a missing error check was added in Kubernetes v1.18, and that the error is always generated.
We will leave the public image available for y'all to test with. It was created with this set of store/stack/builder definitions.
We bisected the versions of K8s and confirmed that this error shows up for us if we are running KinD with a K8s version of v1.18.4 or above (specified using the --image flag). Running v1.18.2, or any earlier minor version of K8s does not exhibit this problem, even with the full store/stack/builder definition currently in cf-for-k8s.
One last test we ran before "end of day" was to reduce the size of the annotations on the cf-default-builder image by removing the ruby buildpack from the store/builder lists, leaving the other seven buildpacks, which allowed us to successfully push an app. When we pulled and inspected the builder image, we saw that the two keys mentioned in the original issue description were shorter, but were still much longer than the 4096-byte limit mentioned in the error message.
We're not sure what that means, but it leads me to speculate that it is not the annotations we can see that is causing the problem, and that K8s/Kind/Containerd is manipulating them in some way that is pushing us over the limit.
We have since increased the size of the cf-default-builder
image by adding the procfile buildpack to each language-specific group in the builder and are now seeing this issue on the latest patch releases of KinD v1.16 and v1.17, as well as v1.19, which was recently released.
Source of the error message:
Recent fix to containerd/cri: https://github.com/containerd/cri/pull/1572 (merged Sept 15, 2020)
From this PR, the math behind the breakage:
In containerd, there is a size limit for label size (4096 chars). If an image has many layers (> (4096-39)/72 > 56), containerd.io/snapshot/cri.image-layers will hit the limit of label > size and the unpack will fail because the annotation will be passed to the snapshotter as a label.
Related: https://github.com/containerd/stargz-snapshotter/pull/148/files
I've asked on the kube/kind slack channel what we need to do to get the fix under PR 1572:
https://kubernetes.slack.com/archives/CEKK1KTN2/p1600364196431100
https://github.com/containerd/stargz-snapshotter/issues/144#issuecomment-694601447
Reports fixes in containerd/containerd and containerd/cri with some toml, but I assume we're dependent on a new version of kind pulling in these changes
https://github.com/kubernetes-sigs/kind/releases/tag/v0.9.0#breaking-changes I think the workaround added to the end there should be sufficient for now, but if not we can expedite a fix release.
Great - thanks again @BenTheElder!
duplicate of https://github.com/pivotal/kpack/issues/473
Describe the bug
We've been seeing consistent failures in our CI when running smoke tests on KinD. After debugging, we found this image pull failure:
Failed to pull image "gcr.io/.../cf-default-builder@sha256:977feb...": rpc error: code = InvalidArgument desc = failed to pull and unpack image "gcr.io/cf-relint-greengrass/cf-workloads/cf-default-builder@sha256:977febd...": failed to prepare extraction snapshot "extract-712270633-CRQK sha256:f0dcba2...": info.Labels: label key and value greater than maximum size (4096 bytes), key: containerd: invalid argument
From the smoke test output, this manifests as hanging after these lines:
To Reproduce*
Steps to reproduce the behavior:
cf push
Expected behavior
Smoke tests succeed on KinD.
Additional context
As a work-around, we've added an overlay to tear out all non-node buildpacks from the cf-default-builder. 59fa844