cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Apache License 2.0
300 stars 115 forks source link

fail: reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging #583

Closed aad closed 3 years ago

aad commented 3 years ago

Describe the bug

we are deploying cf-for-k8s in eks which is an airgaped env in the private subnet, we have relocated the images to ecr and the only error we encountered during the kapp deploy -a cf -f cf-for-k8s-rendered.yml was below. Even though all pods are in ready status.

Would appreciate if someone suggest where to look into this.

10:29:47PM: fail: reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging
10:29:47PM:  ^ Encountered failure condition Ready == False:  (message: Get "https://index.docker.io/v2/": dial tcp: lookup index.docker.io on 10.100.0.10:53: no such host)

kapp: Error: waiting on reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging:
  Finished unsuccessfully (Encountered failure condition Ready == False:  (message: Get "https://index.docker.io/v2/": dial tcp: lookup index.docker.io on 10.100.0.10:53: no such host))

here is the corresponding message from kapp inspect -a cf --status

Namespace  cf-workloads-staging
Name       cf-default-builder
Kind       Builder
Status     conditions:
           - lastTransitionTime: "2020-11-25T14:29:40Z"
             message: 'Get "https://index.docker.io/v2/": dial tcp: lookup index.docker.io on
               10.100.0.10:53: no such host'
             status: "False"
             type: Ready
           observedGeneration: 1
           stack: {}

To Reproduce*

Steps to reproduce the behavior:

  1. relocate all the images and generate manifest
  2. in an airgap env
  3. deploy the rendered manifest
  4. See error

Expected behavior

i can relocate all the images required to the private registry

Additional context

cf-for-k8s SHA

1.0.0 / 73745a3a9891b0d1ceec646c184b09650c626bdb

Cluster information

EKS (in private subnet)

CLI versions

paste output of the following commands

  1. ytt --version: 0.30.0
  2. kapp --version: 0.34.0
  3. kubectl version: v1.19.3
cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175894321

The labels on this github issue will be updated when the story is started.

jamespollard8 commented 3 years ago

Hey @aad, thanks for the report and sorry to hear that this isn't working smoothly for you.

Assuming you're following the instructions from https://github.com/cloudfoundry/cf-for-k8s/blob/develop/docs/platform_operators/system-registry-management.md

  1. When you inspect your kbld lock file cf-for-k8s-images.tmp, do you see any "newImage"s that include docker.io?
  2. When you inspect your final rendered cf-for-k8s-rendered.yml, do you see any references to index.docker.io/v2? If so, where?

Looking forward to hearing back from you, James and @acosta11

jimconner commented 3 years ago

I'm encountering the same issue deploying using kind on local machine, trying to get it to use local registry to keep things fast on my limited Internet bandwidth. I ran through the kbld process and it mirrored all of the container images to my local registry. When deploying the cf-workloads-staging it uploads cf-default-builder image to my dockerhub account instead of local registry.

If you change the app_registry setting in cf-values to a use a different registry, it will attempt to use the user/pass to access dockerhub, seemingly ignoring the hostname.

Example failure whilst attemping to deploy (cf-default-builder not yet present on dockerhub)

5:29:58PM: fail: reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging
5:29:58PM:  ^ Encountered failure condition Ready == False:  (message: HEAD https://index.docker.io/v2/jimconner/cf-default-builder/blobs/sha256:f22ccc0b8772d8e1bcb40f137b373686bc27427a70c0e41dd22b38016e09e7e0: unsupported status code 401)

kapp: Error: waiting on reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging:
  Finished unsuccessfully (Encountered failure condition Ready == False:  (message: HEAD https://index.docker.io/v2/jimconner/cf-default-builder/blobs/sha256:f22ccc0b8772d8e1bcb40f137b373686bc27427a70c0e41dd22b38016e09e7e0: unsupported status code 401))

If cf-default-builder is already present on dockerhub during deploy (so it doesn't need to push the image) then you get the failures showing up when you try to push an app.

Waiting for API to complete processing files...

Staging app and tracing logs...
   Loading secret for "https://my.private.registry:6088/v2/" from secret "cc-kpack-registry-auth-secret-ver-1" at location "/var/build-secrets/cc-kpack-registry-auth-secret-ver-1"
   Error verifying write access to "jimconner/e8c5cfa8-cba0-474b-9596-9913952e9630": POST https://index.docker.io/v2/jimconner/e8c5cfa8-cba0-474b-9596-9913952e9630/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:jimconner/e8c5cfa8-cba0-474b-9596-9913952e9630 Type:repository] map[Action:push Class: Name:jimconner/e8c5cfa8-cba0-474b-9596-9913952e9630 Type:repository]]
StagerError - Stager error: Kpack build failed during container execution: Step failure reason: 'Error', message: ''.
FAILED

Interestingly it says that it's loading the secret for my private registry, but then complains that it doesn't work against dockerhub.

jamespollard8 commented 3 years ago

If you change the app_registry setting in cf-values to a use a different registry, it will attempt to use the user/pass to access dockerhub, seemingly ignoring the hostname.

Ohh interesting. @aad and @jimconner I wonder if this is simply related to the formatting of the app_registry section of your values file. We haven't tested many container registries but we do have examples for Azure and GCR that you may want to mirror: https://github.com/cloudfoundry/cf-for-k8s/blob/690c97b46905969420677c8da4e55716ef1decef/sample-cf-install-values.yml#L93-L106

A simpler test that might work to give you faster feedback on app_registry creds is to run this script: https://github.com/cloudfoundry/cf-for-k8s/blob/develop/hack/validate-registry-access.sh

Looking forward to hearing if this helps either/both of you! James

jimconner commented 3 years ago

@jamespollard8 Using that repostiory prefix notation has fixed it for me. Thank you very much for the help.

aad commented 3 years ago

@jamespollard8, my case might be a little different from Jim's, i don't have internet access at all to docker.io during the deploy process. I did follow the guide and i can not find any reference to docker.io in the newImage and the final rendered manifest.


ls -1 cf*.yml
cf-for-k8s-rendered-dev11.yml
cf-images-dev11.yml
cf-relocated-images-dev11.yml

grep -i docker.io cf-relocated-images-dev11.yml
- image: index.docker.io/bitnami/postgresql@sha256:0f76a419cfd9996036e3a53672f50cf69ed7699f1241cbf8e20af17bbbdf0683
- image: index.docker.io/cfidentity/uaa@sha256:1854fc55c098781802e919681f28cfcb342e676fb5858800e902a206b0bca4b3
- image: index.docker.io/minio/minio@sha256:35d654988af3b5761162d5ce06243c1517ec34beeae271c8b5983b5082968858
- image: index.docker.io/paketobuildpacks/build@sha256:ef09483901fec54c83c41a67e35e80d79450b1fdc0da7375b17bd93fd9a4a96c
- image: index.docker.io/paketobuildpacks/run@sha256:a007dd49172dd89c790a095ec6b54291dcb7bed942dd0a8ffd0a8d0b77cb68b5
- image: index.docker.io/relintdockerhubpushbot/cf-for-k8s-eirini-eirini-controller@sha256:52a7892033893d3d09b2582ae3d562433b30372dcb3678a37e0efd73d174d342
- image: index.docker.io/relintdockerhubpushbot/cf-for-k8s-eirini-event-reporter@sha256:b06e19ac5232a1077474933e412143f1e4449df94b5899d1e6b29fa823c92b3f
- image: index.docker.io/relintdockerhubpushbot/cf-for-k8s-eirini-instance-index-env-injector@sha256:ad9adc876d588fbc6b0592a1f8a9b24efd41e8dcd06ae2244192a7651499def8
- image: index.docker.io/relintdockerhubpushbot/cf-for-k8s-eirini-opi@sha256:d20c957c36644e8f9878bfee1633f319b2e94c82b66b16f9565b5796ed20817b
- image: index.docker.io/relintdockerhubpushbot/cf-for-k8s-eirini-task-reporter@sha256:b6173cbf13ecff7f789acc9fb38667caacbdef8a7ee5e0667e7c1470564fa5da
- image: index.docker.io/cloudfoundry/capi-nginx@sha256:25997cca011ed0761955754a68931f7b1694e487bffba73c6ac8dbcd084f5dee
- image: index.docker.io/cloudfoundry/cf-api-controllers@sha256:9dd23d6669bc1b58147058116554b3d9158a2b9d236fa822bb8e39ed15c7b12c
- image: index.docker.io/cloudfoundry/cf-api-package-registry-buddy@sha256:bb13c6ffe7cab019f3ee35fc629e164fb7d6c6b7d36bd74a0b2ae66090207561
- image: index.docker.io/cloudfoundry/cloud-controller-ng@sha256:5a5672f986ce629bc2aa064532ea23dae21b33490fdbc1632b7361885b6eee7d
- image: index.docker.io/logcache/cf-k8s-logging@sha256:fed765745c41c656f409ff5e264a4cd5f13f89f33cfe468c7f534d40792f647a
- image: index.docker.io/logcache/log-cache-cf-auth-proxy@sha256:c37c1065f026aba7b581f357a9d4268c8724a93630184b76307343195370d0cb
- image: index.docker.io/logcache/log-cache-gateway@sha256:4d85c35ff30b5d1ba22d5fdc2e2d1ccd1782644c5a58ea95a42a6b0da71bb7d2
- image: index.docker.io/logcache/log-cache@sha256:e28a91c324f932a0e867a820210a3645e8ec87026e6195e8d23e4ab8fb7d2bbe
- image: index.docker.io/logcache/syslog-server@sha256:fdd2fb92bed0c2dec4d1ea96c8253304fed8abc2fdf4eb0174bc6f7a50facf35
- image: index.docker.io/oratos/metric-proxy@sha256:aa06c55a4d4af904e6faa44cf776000534d9dc738554a1eff4a85279a504100e
- image: index.docker.io/oratos/statsd_exporter@sha256:10a64dc4ad0a3e3fe88372f0481dea5c02595c38d168617836a99a649d3ac407

grep -i docker.io cf-for-k8s-rendered-dev11.yml
<no output>
aad commented 3 years ago

@jamespollard8, i confirm repository prefix notation fixed my issue as well. I thought kpack controller wanted to pull instead of push. It would be useful to clarify this in the doc to make it easy for new users.

ecr_hostname="$(aws sts get-caller-identity --query 'Account' --output text).dkr.ecr.${region}.amazonaws.com"
ecr_token=$(aws ecr get-login-password)
cat << EOF >> ${cf_value_file}
app_registry:
  hostname: https://${ecr_hostname}
  repository_prefix: "${ecr_hostname}/my_env" # ensure ${ecr_hostname}/my_env/cf-default-builder exists
  username: AWS
  password: ${ecr_token}
EOF
jamespollard8 commented 3 years ago

Oh nice - glad to hear that from both of you!

Thanks @aad, that looks great. I added the ECR example here: d435680